NFDI4DS | UHH-SEMS - Publication Details

Charbel Sakr

ORCID: 0000-0001-5641-0541

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5047303294

Research Areas

Advanced Neural Network Applications
Advanced Memory and Neural Computing
Ferroelectric and Negative Capacitance Devices
Neural Networks and Applications
Adversarial Robustness in Machine Learning
Semiconductor materials and devices
Machine Learning and Algorithms
Radiation Effects in Electronics
Parallel Computing and Optimization Techniques
CCD and CMOS Imaging Sensors
Machine Learning and ELM
Machine Learning and Data Classification
Anomaly Detection Techniques and Applications
Sparse and Compressive Sensing Techniques
Fluid Dynamics Simulations and Interactions
Tensor decomposition and applications
Medical Image Segmentation Techniques
Robotics and Sensor-Based Localization
Real-time simulation and control systems
Image and Object Detection Techniques
Advanced Image and Video Retrieval Techniques
Embedded Systems Design Techniques
Speech Recognition and Synthesis
Stochastic Gradient Optimization Techniques
Brain Tumor Detection and Classification

Nvidia (United States)
2021-2023

Institut des NanoSciences de Paris
2022

Sorbonne Université
2022

Centre National de la Recherche Scientifique
2022

European Synchrotron Radiation Facility
2022

University of Illinois Urbana-Champaign
2017-2021

Nvidia (United Kingdom)
2020

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm

OPENALEX - Publications

Ben Keller Rangharajan Venkatesan Steve Dai Stephen G. Tell Brian Zimmer and 4 more

The energy efficiency of deep neural network (DNN) inference can be improved with custom accelerators. DNN accelerators often employ specialized hardware techniques to improve efficiency, but many these result in catastrophic accuracy loss on transformer-based DNNs, which have become ubiquitous for natural language processing (NLP) tasks. This article presents a accelerator designed efficient execution transformers. proposed implements per-vector scaled quantization (VSQ), employs an...

10.1109/jssc.2023.3234893 article EN IEEE Journal of Solid-State Circuits 2023-01-18

PredictiveNet: An energy-efficient convolutional neural network via zero prediction

OPENALEX - Publications

Yingyan Lin Charbel Sakr Yongjune Kim Naresh R. Shanbhag

Convolutional neural networks (CNNs) have gained considerable interest due to their record-breaking performance in many recognition tasks. However, the computational complexity of CNNs precludes deployments on power-constrained embedded platforms. In this paper, we propose predictive CNN (PredictiveNet), which predicts sparse outputs non-linear layers thereby bypassing a majority computations. PredictiveNet skips large fraction convolutions at runtime without modifying structure or requiring...

10.1109/iscas.2017.8050797 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2017-05-01

BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference

OPENALEX - Publications

R. Elangovan Charbel Sakr Anand Raghunathan Brucek Khailany

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only weights sub-8-bits while maintaining activations at 8-bits or higher. Accurate sub-8-bit for both relying quantization-aware remains significant challenge. We propose novel method called block clustered (BCQ) wherein each operand tensor decomposed into...

10.48550/arxiv.2502.05376 preprint EN arXiv (Cornell University) 2025-02-07

HarDNN: Feature Map Vulnerability Evaluation in CNNs

OPENALEX - Publications

Abdulrahman Mahmoud Siva Kumar Sastry Hari Christopher W. Fletcher Sarita V. Adve Charbel Sakr and 5 more

As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably the face of hardware errors. Transient errors may percolate undesirable state during execution, resulting software-manifested which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed approach to identify vulnerable computations CNN inference and selectively protect them based on their propensity...

10.48550/arxiv.2002.09786 preprint EN other-oa arXiv (Cornell University) 2020-01-01

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

OPENALEX - Publications

Charbel Sakr Naresh R. Shanbhag

There has been growing interest in the deployment of deep learning systems onto resource-constrained platforms for fast and efficient inference. However, typical models are overwhelmingly complex, making such integration very challenging requiring compression mechanisms as reduced precision. We present a layer-wise granular precision analysis which allows us to efficiently quantize pre-trained neural networks at minimal cost terms accuracy degradation. Our results consistent with recent...

10.1109/icassp.2018.8461702 article EN 2018-04-01

Optimizing Selective Protection for CNN Resilience

OPENALEX - Publications

Abdulrahman Mahmoud Siva Kumar Sastry Hari Christopher W. Fletcher Sarita V. Adve Charbel Sakr and 5 more

As CNNs are being extensively employed in high performance and safety-critical applications that demand reliability, it is important to ensure they resilient transient hardware errors. Traditional full redundancy solutions provide error coverage, but the associated overheads often prohibitively for resource-constrained systems. In this work, we propose software-directed selective protection techniques target most vulnerable work a CNN, providing low-cost solution. We evaluate two...

10.1109/issre52982.2021.00025 article EN 2021-10-01

True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization

OPENALEX - Publications

Charbel Sakr Jungwook Choi Zhuo Wang Kailash Gopalakrishnan Naresh R. Shanbhag

With the ever growing popularity of deep learning, tremendous complexity neural networks is becoming problematic when one considers inference on resource constrained platforms. Binary have emerged as a potential solution, however, they exhibit fundamentallimi-tation in realizing gradient-based learning their activations are non-differentiable. Current work has so far relied approximating gradients order to use back-propagation algorithm via straight through estimator (STE). Such...

10.1109/icassp.2018.8461456 article EN 2018-04-01

Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm

OPENALEX - Publications

Charbel Sakr Naresh R. Shanbhag

The high computational and parameter complexity of neural networks makes their training very slow difficult to deploy on energy storage-constrained computing systems. Many network reduction techniques have been proposed including fixed-point implementation. However, a systematic approach for designing full inference deep remains elusive. We describe precision assignment methodology in which all parameters, i.e., activations weights the feedforward path, gradients weight accumulators feedback...

10.48550/arxiv.1812.11732 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A 0.44-μJ/dec, 39.9-μs/dec, Recurrent Attention In-Memory Processor for Keyword Spotting

OPENALEX - Publications

Hassan Dbouk Sujan K. Gonugondla Charbel Sakr Naresh R. Shanbhag

This article presents a deep learning-based classifier IC for keyword spotting (KWS) in 65-nm CMOS designed using an algorithm-hardware co-design approach. First, recurrent attention model (RAM) algorithm the KWS task (the KeyRAM algorithm) is proposed. The enables accuracy versus energy scalability via confidence-based computation (CC) scheme, leading to 2.5× reduction computational complexity compared state-of-the-art (SOTA) neural networks, and well-suited in-memory computing (IMC) since...

10.1109/jssc.2020.3029586 article EN IEEE Journal of Solid-State Circuits 2020-10-26

Fundamental limits on the precision of in-memory architectures

OPENALEX - Publications

Sujan K. Gonugondla Charbel Sakr Hassan Dbouk Naresh R. Shanbhag

This paper obtains the fundamental limits on computational precision of in-memory computing architectures (IMCs). Various compute SNR metrics for IMCs are defined and their interrelationships analyzed to show that accuracy is fundamentally limited by (SNRa) its analog core, activation, weight output needs be assigned appropriately final SNRT → SNRa. The minimum criterion (MPC) proposed minimize hence column analog-to-digital converter (ADC) precision. charge summing (QS) model associated IMC...

10.1145/3400302.3416344 article EN 2020-11-02

Signal Processing Methods to Enhance the Energy Efficiency of In-Memory Computing Architectures

OPENALEX - Publications

Charbel Sakr Naresh R. Shanbhag

This paper presents signal processing methods to enhance the energy vs. accuracy trade-off of in-memory computing (IMC) architectures. First, an optimal clipping criterion (OCC) for quantization is proposed in order minimize precision column analog-to-digital converters (ADCs) at iso-accuracy. For a Gaussian distributed signal, OCC shown reduce ADC requirements by 3 bits signal-to-quantization noise ratio (SQNR) 22.5 dB over commonly used full range (FR) quantizer. Next, input-sliced...

10.1109/tsp.2021.3130488 article EN cc-by IEEE Transactions on Signal Processing 2021-01-01

Unique orientation of 1D and 2D nanoparticle assemblies confined in smectic topological defects

OPENALEX - Publications

Haïfa Jeridi Jean de Dieu Niyonzima Charbel Sakr Amine Missaoui Sharif Shahini and 13 more

New collective optical properties have emerged recently from organized and oriented arrays of closely packed semiconducting metallic nanoparticles (NPs). However, it is still challenging to obtain NP assemblies which are similar everywhere on a given sample and, most importantly, share unique common orientation that would guarantee behavior the sample. In this context, by combining microscopy, fluorescence microscopy synchrotron-based grazing incidence X-ray scattering (GISAXS) gold...

10.1039/d2sm00376g article EN Soft Matter 2022-01-01

Minimum precision requirements for the SVM-SGD learning algorithm

OPENALEX - Publications

Charbel Sakr Ameya D. Patil Sai Zhang Yongjune Kim Naresh R. Shanbhag

It is well-known that the precision of data, weight vector, and internal representations employed in learning systems directly impacts their energy, throughput, latency. The requirements for training algorithm are also important learn on-the-fly. In this paper, we present analytical lower bounds on commonly stochastic gradient descent (SGD) on-line specific context a support vector machine (SVM). These obtained subject to desired system performance. validated using UCI breast cancer dataset....

10.1109/icassp.2017.7952334 article EN 2017-03-01

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications

OPENALEX - Publications

Sujan K. Gonugondla Charbel Sakr Hassan Dbouk Naresh R. Shanbhag

This article obtains fundamental limits on the computational precision of in-memory computing architectures (IMCs). An IMC noise model and associated signal-to-noise ratio (SNR) metrics are defined their interrelationships analyzed to show that accuracy IMCs is fundamentally limited by compute SNR ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula> ) its...

10.1109/tcad.2021.3124757 article EN cc-by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2021-11-02

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

OPENALEX - Publications

Charbel Sakr Steve Dai Rangharajan Venkatesan Brian Zimmer William J. Dally and 1 more

Data clipping is crucial in reducing noise quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set threshold scalars cannot be shown optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm determine MSE-optimal scalars. Derived from fast Newton-Raphson method, OCTAV finds optimal fly, for every tensor, at iteration QAT routine. Thus, formulated with provably minimum each...

10.48550/arxiv.2206.06501 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

OPENALEX - Publications

Charbel Sakr Naigang Wang Chia‐Yu Chen Jungwook Choi Ankur Agrawal and 2 more

Efforts to reduce the numerical precision of computations in deep learning training have yielded systems that aggressively quantize weights and activations, yet employ wide high-precision accumulators for partial sums inner-product operations preserve quality convergence. The absence any framework analyze requirements sum accumulations results conservative design choices. This imposes an upper-bound on reduction complexity multiply-accumulate units. We present a statistical approach impact...

10.48550/arxiv.1901.06588 preprint EN other-oa arXiv (Cornell University) 2019-01-01

KeyRAM: A 0.34 uJ/decision 18 k decisions/s Recurrent Attention In-memory Processor for Keyword Spotting

OPENALEX - Publications

Hassan Dbouk Sujan K. Gonugondla Charbel Sakr Naresh R. Shanbhag

This paper presents a 0.34 uJ/decision deep learning-based classifier for keyword spotting (KWS) in 65 nm CMOS with all weights stored on-chip. work adapts Recurrent Attention Model (RAM) algorithm the KWS task, and employs an in-memory computing (IMC) architecture to achieve up 9× savings energy/decision more than 23× EDP of decisions over state-of-the art IMC IC using Google Speech dataset while achieving highest reported decision throughput 18.32 k decisions/s.

10.1109/cicc48029.2020.9075923 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2020-03-01

ESPACE: Dimensionality Reduction of Activations for Model Compression

OPENALEX - Publications

Charbel Sakr Brucek Khailany

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set principal components. The activation-centrality the approach enables retraining LLMs with no loss expressivity; while at inference, weight decomposition is obtained as byproduct matrix multiplication associativity. Theoretical results construction projection matrices optimal...

10.48550/arxiv.2410.05437 preprint EN arXiv (Cornell University) 2024-10-07

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

OPENALEX - Publications

Shih-Yang Liu Huck Yang C.-M. Wang Nai Chit Fung Hongxu Yin and 7 more

In this work, we re-formulate the model compression problem into customized compensation problem: Given a compressed model, aim to introduce residual low-rank paths compensate for errors under requirements from users (e.g., tasks, ratios), resulting in greater flexibility adjusting overall capacity without being constrained by specific formats. However, naively applying SVD derive causes suboptimal utilization of representation capacity. Instead, propose Training-free Eigenspace Low-Rank...

10.48550/arxiv.2410.21271 preprint EN arXiv (Cornell University) 2024-10-28

Minimum Precision Requirements for Deep Learning with Biomedical Datasets

OPENALEX - Publications

Charbel Sakr Naresh R. Shanbhag

Deep neural networks (DNNs) are powerful machine learning models but typically deployed in large computing clusters due to their high computational and parameter complexity. Many biomedical applications require embedded inference on resource-constrained platforms thus causing a challenge when considering the deployment of DNNs. One method address this is via reduced precision implementations. We use an analytical determine suitable minimum requirements DNNs show its application CHB-MIT EEG...

10.1109/biocas.2018.8584732 article EN 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS) 2018-10-01

Coming Soon ...