NFDI4DS | UHH-SEMS - Publication Details

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

OPENALEX - Publications

Fareed Qararyah Muhammad Waqar Azhar Pedro Trancoso

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such is having heterogeneity than traditional CNNs. This present at two levels: intra-layer type inter-layer type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers proposed model-specific with dedicated engines. When designing an accelerator...

10.1145/3639823 article EN ACM Transactions on Architecture and Code Optimization 2024-01-08

Simulation of Quantum Computers: Review and Acceleration Opportunities

OPENALEX - Publications

Alessio Cicero Mohammad Ali Maleki Muhammad Waqar Azhar Anton Frisk Kockum Pedro Trancoso

Quantum computing has the potential to revolutionize multiple fields by solving complex problems that can not be solved in reasonable time with current classical computers. Nevertheless, development of quantum computers is still its early stages and available systems have very limited resources. As such, currently, most practical way develop test algorithms use simulators In addition, new their components also depends on simulations. Given characteristics a computer, simulation demanding...

10.48550/arxiv.2410.12660 preprint EN arXiv (Cornell University) 2024-10-16

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints

OPENALEX - Publications

Muhammad Waqar Azhar Miquel Pericàs Per Stenström

Improving energy efficiency is an important goal of computer system design. This article focuses on a general model task-parallel applications under quality-of-service requirements the completion time. Our technique, called Task-RM , exploits variance in task execution-times and imbalance between tasks to allocate just enough resources terms voltage-frequency core-allocation so that application completes before deadline. Moreover, we provide solution can harness additional savings with...

10.1145/3494537 article EN ACM Transactions on Architecture and Code Optimization 2022-01-23

SLOOP

OPENALEX - Publications

Muhammad Waqar Azhar Per Stenström Vassilis Papaefstathiou

Most systems allocate computational resources to each executing task without any actual knowledge of the application’s Quality-of-Service (QoS) requirements. Such best-effort policies lead overprovisioning and increase energy loss. This work assumes applications with soft QoS requirements exploits inherent timing slack minimize allocated reduce consumption. We propose a lightweight progress-tracking methodology based on outer loops application kernels. It builds online history uses it...

10.1145/3148053 article EN ACM Transactions on Architecture and Code Optimization 2017-12-05

5G Networks: Challenges and Techniques for Energy Efficiency

OPENALEX - Publications

Muhammad Waqar Azhar Amna Shabbir

The growing importance of energy efficient networks with high data rate requirements is a major concern for network operators. Services provided by the operators are required to ensure consumers’ satisfaction. For providing rates good signal quality, small cells deployed. But these can increase consumption if not equipped some intelligent power saving or distribution mechanism. In this paper, previously tested cell sleeping mode scheme compared new proposed reducing in low normal traffic...

10.48084/etasr.1623 article EN cc-by Engineering Technology & Applied Science Research 2018-04-19

Viterbi Accelerator for Embedded Processor Datapaths

OPENALEX - Publications

Muhammad Waqar Azhar Magnus Själander Hasan Ali Akshay Vijayashekar Tung Thanh Hoang and 2 more

We present a novel architecture for lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor datapath. investigate the accelerator's impact on performance by using EEMBC benchmark and in-house Branch Metric kernel. Our evaluation based shows accelerated 65-nm 2.7-ns datapath is 20% larger but 90% more cycle efficient than lacking accelerator, leading to 87% overall energy reduction data throughput of 3.52 Mbit/s.

10.1109/asap.2012.24 article EN 2012-07-01

SaC

OPENALEX - Publications

Muhammad Waqar Azhar Miquel Pericàs Per Stenström

Reducing the energy to carry out computational tasks is key almost any computing application. We focus in this paper on iterative applications that have explicit deadlines per iteration. Our objective meet while minimizing energy. leverage vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). note when choosing most energy-efficient...

10.1145/3337821.3337865 article EN 2019-07-25

Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints

OPENALEX - Publications

Muhammad Waqar Azhar Madhavan Manivannan Per Stenström

Reducing energy consumption while providing performance and quality guarantees is crucial for computing systems ranging from battery-powered embedded to data centers. This article considers approximate iterative applications executing on heterogeneous multi-core platforms under user-specified targets. We note that allowing a slight yet bounded relaxation in solution can considerably reduce the required iteration count thereby save significant amounts of energy. To this end, proposes...

10.1145/3605214 article EN ACM Transactions on Architecture and Code Optimization 2023-06-22

Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs

OPENALEX - Publications

Fareed Qararyah Muhammad Waqar Azhar Mohammad Ali Maleki Pedro Trancoso

Depthwise and pointwise convolutions have fewer parameters perform operations than standard convolutions. As a result, they become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) vision transformers (ViTs). However, lower compute-to-memory-access ratio convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise to overcome access The focus is on these operators GPUs. prior art GPU-based fusion...

10.48550/arxiv.2404.19331 preprint EN arXiv (Cornell University) 2024-04-30

DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators

OPENALEX - Publications

Piyumal Ranawaka Muhammad Waqar Azhar Per Stenström

10.1145/3649153.3649196 article EN 2024-05-07

Scratchpad Memory Management for Deep Learning Accelerators

OPENALEX - Publications

Stavroula Zouzoula Mohammad Ali Maleki Muhammad Waqar Azhar Pedro Trancoso

The success of Artificial Intelligence (AI) applications is driven by efficient hardware accelerators. Recent trends show a rapid increase in the application demands, which most cases surpass available resources As such, management these limited becomes critical factor achieving high-performance.

10.1145/3673038.3673115 article EN cc-by 2024-08-08

Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs

OPENALEX - Publications

Fareed Qararyah Muhammad Waqar Azhar Mohammad Ali Maleki Pedro Trancoso

Depthwise and pointwise convolutions have fewer parameters perform operations than standard convolutions. As a result, they become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) vision transformers (ViTs). However, lower compute-to-memory-access ratio convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise to overcome access The focus is on these operators GPUs. prior art GPU-based fusion...

10.1145/3677333.3678153 article EN cc-by 2024-08-09

VEDLIoT

OPENALEX - Publications

K. Mika René Griessl Nils Kucza Florian Porrmann M. Shamim Kaiser and 27 more

The VEDLIoT project aims to develop energy-efficient Deep Learning methodologies for distributed Artificial Intelligence of Things (AIoT) applications. During our project, we propose a holistic approach that focuses on optimizing algorithms while addressing safety and security challenges inherent AIoT systems. foundation this lies in modular scalable cognitive IoT hardware platform, which leverages microserver technology enable users configure the meet requirements diverse array...

10.1145/3587135.3592175 preprint EN 2023-05-09

ARADA

OPENALEX - Publications

Muhammad Waqar Azhar Stavroula Zouzoula Pedro Trancoso

Deep Learning (DL) applications are entering every part of our life given their ability to solve complex problems. Nevertheless, energy efficiency is still a major concern due the large computational and memory requirements. State-of-the-art accelerators strive address this issue by optimizing architecture compute requirements DL algorithms. However, there always mismatch between what offered particular design. A way close gap providing run-time adaptation or resource allocation improve efficiency.

10.1145/3587135.3592207 article EN 2023-05-09

Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor

OPENALEX - Publications

Muhammad Waqar Azhar Tung Thanh Hoang Per Larsson-Edefors

A proven approach to increase performance of general-purpose processors is add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set datapath units. But thanks flexible interconnect and wide control word, explicitly designed support integration special units that, on demand, can accelerate certain data-intensive applications. We present versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate accelerator's...

10.1109/dsd.2010.51 article EN 2010-09-01

FiBHA: Fixed Budget Hybrid CNN Accelerator

OPENALEX - Publications

Fareed Qararyah Muhammad Waqar Azhar Pedro Trancoso

Seeking the "sweet spot" in accuracy-efficiency trade-off is increasing heterogeneity of state-of-the-art Convolutional Neural Networks (CNNs). Such CNN models exhibit at two levels: intra- and inter-layer-type. Generic accelerators do not capture these levels heterogeneity. Consequently, researchers have proposed model-specific with dedicated modules or engines. The belong to categories ends design spectrum. In first category, contain a minimal number engines such that all layers one type...

10.1109/sbac-pad55451.2022.00029 article EN 2022-11-01

Dynamic Bandwidth Allocation in LAN using Dynamic Excess Rate Sensing

OPENALEX - Publications

Muhammad Abubakar Muhammad Muhammad Waqar Azhar Abid Sultan Muhammad Afrasayab

Today human and information processing system both need rapid access to anything they want on the internet. To fulfill these needs more internet service providers with a large amount of bandwidth are introducing themselves in market. For providers, lot is free during off-peak hours while peak total available might be insufficient. The primary purpose our research divide distribute excessive among users attain maximum user satisfaction. In order do this dynamic excess rate (DER) scheme its...

10.14569/ijacsa.2019.0100571 article EN International Journal of Advanced Computer Science and Applications 2019-01-01

VSA: A Hybrid Vector-Systolic Architecture

OPENALEX - Publications

Mateo Vázquez Muhammad Waqar Azhar Pedro Trancoso

In order to deliver high performance efficiently, modern processors include dedicated hardware accelerate different application domains. For example, several recent Machine Learning (ML) accelerators. However, while adding improves efficiency compared general-purpose CPUs, it also requires a larger area, making unfeasible for smaller devices. Therefore, exploring ways use the existing functionalities becomes desirable in those setups. this work, we explore reuse of components Vector...

10.1109/iccd56317.2022.00061 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2022-10-01

RAINBOW: Multi-Dimensional Hardware-Software Co-Design for DL Accelerator On-Chip Memory

OPENALEX - Publications

Stavroula Zouzoula Muhammad Waqar Azhar Pedro Trancoso

Deep Learning (DL) is developing at an extremely fast pace. The increased number of applications, optimizations and hardware devices available, results in a multi-dimensional design space where the best performance achieved with detailed analysis hardware-software co-design process. Furthermore, high demands for memory off-chip latency cost result on-chip becoming critical achieving efficiency. In this work, we propose RAINBOW, tool to assist DL accelerators' memory. purpose help and/or...

10.1109/ispass57527.2023.00050 article EN 2023-04-01

Exploiting the Potential of Flexible Processing Units

OPENALEX - Publications

Mateo Vázquez Muhammad Waqar Azhar Pedro Trancoso

In order to meet the increased computational demands and stricter power constraints of modern applications, architectures have evolved include domain-specific accelerators. design efficient accelerators, three main challenges need be addressed: compute, memory, control. Moreover, since SoCs usually contain multiple selecting right one for each task also become crucial. This becomes specially relevant in Flexible Processing Units (xPUs), processing units that provide functionalities with same...

10.1109/sbac-pad59825.2023.00013 article EN 2023-10-17

Analisa Perencanaan Backhaul Untuk Jaringan Long Term Evolution (LTE) Dikota Yogyakarta

OPENALEX - Publications

Muhammad Waqar Azhar Zein Hanni Pradana Ade Wahyudin

Pelayanan telekomunikasi sangat berperan penting dalam kehidupan modern. Perkembangan Teknologi LTE dikota besar. Selaku Operator Memerlukan adanya backhaul yang handal namun juga efisien dari transmisi maupun segi kapasitas. Backhaul adalah suatu jalur menghubungkan Base Station ke lain atau core network untuk mengambil trafik tersebut.Pada Penelitian ini Membahas tentang analisa perencanaan jaringan Long term Evolution di kota Yogyakarta. Dengan menggunakan Microwave sebagai teknologi...

10.30595/techno.v19i2.3010 article ID cc-by Techno (Jurnal Fakultas Teknik Universitas Muhammadiyah Purwokerto) 2018-10-31