Muhammad Waqar Azhar

ORCID: 0000-0003-0477-4540
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Cloud Computing and Resource Management
  • Advanced Neural Network Applications
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • CCD and CMOS Imaging Sensors
  • Advanced Memory and Neural Computing
  • Low-power high-performance VLSI design
  • Generative Adversarial Networks and Image Synthesis
  • Radiation Effects in Electronics
  • Advanced Image and Video Retrieval Techniques
  • Power Line Communications and Noise
  • Transportation and Mobility Innovations
  • Telecommunications and Broadcasting Technologies
  • Advanced MIMO Systems Optimization
  • 3D Shape Modeling and Analysis
  • Digital Media Forensic Detection
  • Quantum Computing Algorithms and Architecture
  • Network Traffic and Congestion Control
  • Embedded Systems and FPGA Design
  • Interconnection Networks and Systems
  • IoT and Edge/Fog Computing
  • Green IT and Sustainability
  • Impact of Light on Environment and Health
  • Error Correcting Code Techniques

Chalmers University of Technology
2010-2024

University of Sargodha
2019

Institut Teknologi Telkom Purwokerto
2018

NED University of Engineering and Technology
2018

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such is having heterogeneity than traditional CNNs. This present at two levels: intra-layer type inter-layer type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers proposed model-specific with dedicated engines. When designing an accelerator...

10.1145/3639823 article EN ACM Transactions on Architecture and Code Optimization 2024-01-08

Quantum computing has the potential to revolutionize multiple fields by solving complex problems that can not be solved in reasonable time with current classical computers. Nevertheless, development of quantum computers is still its early stages and available systems have very limited resources. As such, currently, most practical way develop test algorithms use simulators In addition, new their components also depends on simulations. Given characteristics a computer, simulation demanding...

10.48550/arxiv.2410.12660 preprint EN arXiv (Cornell University) 2024-10-16

Improving energy efficiency is an important goal of computer system design. This article focuses on a general model task-parallel applications under quality-of-service requirements the completion time. Our technique, called Task-RM , exploits variance in task execution-times and imbalance between tasks to allocate just enough resources terms voltage-frequency core-allocation so that application completes before deadline. Moreover, we provide solution can harness additional savings with...

10.1145/3494537 article EN ACM Transactions on Architecture and Code Optimization 2022-01-23

Most systems allocate computational resources to each executing task without any actual knowledge of the application’s Quality-of-Service (QoS) requirements. Such best-effort policies lead overprovisioning and increase energy loss. This work assumes applications with soft QoS requirements exploits inherent timing slack minimize allocated reduce consumption. We propose a lightweight progress-tracking methodology based on outer loops application kernels. It builds online history uses it...

10.1145/3148053 article EN ACM Transactions on Architecture and Code Optimization 2017-12-05

The growing importance of energy efficient networks with high data rate requirements is a major concern for network operators. Services provided by the operators are required to ensure consumers’ satisfaction. For providing rates good signal quality, small cells deployed. But these can increase consumption if not equipped some intelligent power saving or distribution mechanism. In this paper, previously tested cell sleeping mode scheme compared new proposed reducing in low normal traffic...

10.48084/etasr.1623 article EN cc-by Engineering Technology & Applied Science Research 2018-04-19

We present a novel architecture for lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor datapath. investigate the accelerator's impact on performance by using EEMBC benchmark and in-house Branch Metric kernel. Our evaluation based shows accelerated 65-nm 2.7-ns datapath is 20% larger but 90% more cycle efficient than lacking accelerator, leading to 87% overall energy reduction data throughput of 3.52 Mbit/s.

10.1109/asap.2012.24 article EN 2012-07-01

Reducing the energy to carry out computational tasks is key almost any computing application. We focus in this paper on iterative applications that have explicit deadlines per iteration. Our objective meet while minimizing energy. leverage vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). note when choosing most energy-efficient...

10.1145/3337821.3337865 article EN 2019-07-25

Reducing energy consumption while providing performance and quality guarantees is crucial for computing systems ranging from battery-powered embedded to data centers. This article considers approximate iterative applications executing on heterogeneous multi-core platforms under user-specified targets. We note that allowing a slight yet bounded relaxation in solution can considerably reduce the required iteration count thereby save significant amounts of energy. To this end, proposes...

10.1145/3605214 article EN ACM Transactions on Architecture and Code Optimization 2023-06-22

Depthwise and pointwise convolutions have fewer parameters perform operations than standard convolutions. As a result, they become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) vision transformers (ViTs). However, lower compute-to-memory-access ratio convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise to overcome access The focus is on these operators GPUs. prior art GPU-based fusion...

10.48550/arxiv.2404.19331 preprint EN arXiv (Cornell University) 2024-04-30

The success of Artificial Intelligence (AI) applications is driven by efficient hardware accelerators. Recent trends show a rapid increase in the application demands, which most cases surpass available resources As such, management these limited becomes critical factor achieving high-performance.

10.1145/3673038.3673115 article EN cc-by 2024-08-08

Depthwise and pointwise convolutions have fewer parameters perform operations than standard convolutions. As a result, they become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) vision transformers (ViTs). However, lower compute-to-memory-access ratio convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise to overcome access The focus is on these operators GPUs. prior art GPU-based fusion...

10.1145/3677333.3678153 article EN cc-by 2024-08-09

The VEDLIoT project aims to develop energy-efficient Deep Learning methodologies for distributed Artificial Intelligence of Things (AIoT) applications. During our project, we propose a holistic approach that focuses on optimizing algorithms while addressing safety and security challenges inherent AIoT systems. foundation this lies in modular scalable cognitive IoT hardware platform, which leverages microserver technology enable users configure the meet requirements diverse array...

10.1145/3587135.3592175 preprint EN 2023-05-09

Deep Learning (DL) applications are entering every part of our life given their ability to solve complex problems. Nevertheless, energy efficiency is still a major concern due the large computational and memory requirements. State-of-the-art accelerators strive address this issue by optimizing architecture compute requirements DL algorithms. However, there always mismatch between what offered particular design. A way close gap providing run-time adaptation or resource allocation improve efficiency.

10.1145/3587135.3592207 article EN 2023-05-09

A proven approach to increase performance of general-purpose processors is add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set datapath units. But thanks flexible interconnect and wide control word, explicitly designed support integration special units that, on demand, can accelerate certain data-intensive applications. We present versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate accelerator's...

10.1109/dsd.2010.51 article EN 2010-09-01

Seeking the "sweet spot" in accuracy-efficiency trade-off is increasing heterogeneity of state-of-the-art Convolutional Neural Networks (CNNs). Such CNN models exhibit at two levels: intra- and inter-layer-type. Generic accelerators do not capture these levels heterogeneity. Consequently, researchers have proposed model-specific with dedicated modules or engines. The belong to categories ends design spectrum. In first category, contain a minimal number engines such that all layers one type...

10.1109/sbac-pad55451.2022.00029 article EN 2022-11-01

Today human and information processing system both need rapid access to anything they want on the internet. To fulfill these needs more internet service providers with a large amount of bandwidth are introducing themselves in market. For providers, lot is free during off-peak hours while peak total available might be insufficient. The primary purpose our research divide distribute excessive among users attain maximum user satisfaction. In order do this dynamic excess rate (DER) scheme its...

10.14569/ijacsa.2019.0100571 article EN International Journal of Advanced Computer Science and Applications 2019-01-01

In order to deliver high performance efficiently, modern processors include dedicated hardware accelerate different application domains. For example, several recent Machine Learning (ML) accelerators. However, while adding improves efficiency compared general-purpose CPUs, it also requires a larger area, making unfeasible for smaller devices. Therefore, exploring ways use the existing functionalities becomes desirable in those setups. this work, we explore reuse of components Vector...

10.1109/iccd56317.2022.00061 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2022-10-01

Deep Learning (DL) is developing at an extremely fast pace. The increased number of applications, optimizations and hardware devices available, results in a multi-dimensional design space where the best performance achieved with detailed analysis hardware-software co-design process. Furthermore, high demands for memory off-chip latency cost result on-chip becoming critical achieving efficiency. In this work, we propose RAINBOW, tool to assist DL accelerators' memory. purpose help and/or...

10.1109/ispass57527.2023.00050 article EN 2023-04-01

In order to meet the increased computational demands and stricter power constraints of modern applications, architectures have evolved include domain-specific accelerators. design efficient accelerators, three main challenges need be addressed: compute, memory, control. Moreover, since SoCs usually contain multiple selecting right one for each task also become crucial. This becomes specially relevant in Flexible Processing Units (xPUs), processing units that provide functionalities with same...

10.1109/sbac-pad59825.2023.00013 article EN 2023-10-17

Pelayanan telekomunikasi sangat berperan penting dalam kehidupan modern. Perkembangan Teknologi LTE dikota besar. Selaku Operator Memerlukan adanya backhaul yang handal namun juga efisien dari transmisi maupun segi kapasitas. Backhaul adalah suatu jalur menghubungkan Base Station ke lain atau core network untuk mengambil trafik tersebut.Pada Penelitian ini Membahas tentang analisa perencanaan jaringan Long term Evolution di kota Yogyakarta. Dengan menggunakan Microwave sebagai teknologi...

10.30595/techno.v19i2.3010 article ID cc-by Techno (Jurnal Fakultas Teknik Universitas Muhammadiyah Purwokerto) 2018-10-31
Coming Soon ...