Charbel Sakr

ORCID: 0000-0001-5641-0541
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Neural Networks and Applications
  • Adversarial Robustness in Machine Learning
  • Semiconductor materials and devices
  • Machine Learning and Algorithms
  • Radiation Effects in Electronics
  • Parallel Computing and Optimization Techniques
  • CCD and CMOS Imaging Sensors
  • Machine Learning and ELM
  • Machine Learning and Data Classification
  • Anomaly Detection Techniques and Applications
  • Sparse and Compressive Sensing Techniques
  • Fluid Dynamics Simulations and Interactions
  • Tensor decomposition and applications
  • Medical Image Segmentation Techniques
  • Robotics and Sensor-Based Localization
  • Real-time simulation and control systems
  • Image and Object Detection Techniques
  • Advanced Image and Video Retrieval Techniques
  • Embedded Systems Design Techniques
  • Speech Recognition and Synthesis
  • Stochastic Gradient Optimization Techniques
  • Brain Tumor Detection and Classification

Nvidia (United States)
2021-2023

Institut des NanoSciences de Paris
2022

Sorbonne Université
2022

Centre National de la Recherche Scientifique
2022

European Synchrotron Radiation Facility
2022

University of Illinois Urbana-Champaign
2017-2021

Nvidia (United Kingdom)
2020

The energy efficiency of deep neural network (DNN) inference can be improved with custom accelerators. DNN accelerators often employ specialized hardware techniques to improve efficiency, but many these result in catastrophic accuracy loss on transformer-based DNNs, which have become ubiquitous for natural language processing (NLP) tasks. This article presents a accelerator designed efficient execution transformers. proposed implements per-vector scaled quantization (VSQ), employs an...

10.1109/jssc.2023.3234893 article EN IEEE Journal of Solid-State Circuits 2023-01-18

Convolutional neural networks (CNNs) have gained considerable interest due to their record-breaking performance in many recognition tasks. However, the computational complexity of CNNs precludes deployments on power-constrained embedded platforms. In this paper, we propose predictive CNN (PredictiveNet), which predicts sparse outputs non-linear layers thereby bypassing a majority computations. PredictiveNet skips large fraction convolutions at runtime without modifying structure or requiring...

10.1109/iscas.2017.8050797 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2017-05-01

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only weights sub-8-bits while maintaining activations at 8-bits or higher. Accurate sub-8-bit for both relying quantization-aware remains significant challenge. We propose novel method called block clustered (BCQ) wherein each operand tensor decomposed into...

10.48550/arxiv.2502.05376 preprint EN arXiv (Cornell University) 2025-02-07

As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably the face of hardware errors. Transient errors may percolate undesirable state during execution, resulting software-manifested which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed approach to identify vulnerable computations CNN inference and selectively protect them based on their propensity...

10.48550/arxiv.2002.09786 preprint EN other-oa arXiv (Cornell University) 2020-01-01

There has been growing interest in the deployment of deep learning systems onto resource-constrained platforms for fast and efficient inference. However, typical models are overwhelmingly complex, making such integration very challenging requiring compression mechanisms as reduced precision. We present a layer-wise granular precision analysis which allows us to efficiently quantize pre-trained neural networks at minimal cost terms accuracy degradation. Our results consistent with recent...

10.1109/icassp.2018.8461702 article EN 2018-04-01

As CNNs are being extensively employed in high performance and safety-critical applications that demand reliability, it is important to ensure they resilient transient hardware errors. Traditional full redundancy solutions provide error coverage, but the associated overheads often prohibitively for resource-constrained systems. In this work, we propose software-directed selective protection techniques target most vulnerable work a CNN, providing low-cost solution. We evaluate two...

10.1109/issre52982.2021.00025 article EN 2021-10-01

With the ever growing popularity of deep learning, tremendous complexity neural networks is becoming problematic when one considers inference on resource constrained platforms. Binary have emerged as a potential solution, however, they exhibit fundamentallimi-tation in realizing gradient-based learning their activations are non-differentiable. Current work has so far relied approximating gradients order to use back-propagation algorithm via straight through estimator (STE). Such...

10.1109/icassp.2018.8461456 article EN 2018-04-01

The high computational and parameter complexity of neural networks makes their training very slow difficult to deploy on energy storage-constrained computing systems. Many network reduction techniques have been proposed including fixed-point implementation. However, a systematic approach for designing full inference deep remains elusive. We describe precision assignment methodology in which all parameters, i.e., activations weights the feedforward path, gradients weight accumulators feedback...

10.48550/arxiv.1812.11732 preprint EN other-oa arXiv (Cornell University) 2018-01-01

This article presents a deep learning-based classifier IC for keyword spotting (KWS) in 65-nm CMOS designed using an algorithm-hardware co-design approach. First, recurrent attention model (RAM) algorithm the KWS task (the KeyRAM algorithm) is proposed. The enables accuracy versus energy scalability via confidence-based computation (CC) scheme, leading to 2.5× reduction computational complexity compared state-of-the-art (SOTA) neural networks, and well-suited in-memory computing (IMC) since...

10.1109/jssc.2020.3029586 article EN IEEE Journal of Solid-State Circuits 2020-10-26

This paper obtains the fundamental limits on computational precision of in-memory computing architectures (IMCs). Various compute SNR metrics for IMCs are defined and their interrelationships analyzed to show that accuracy is fundamentally limited by (SNRa) its analog core, activation, weight output needs be assigned appropriately final SNRT → SNRa. The minimum criterion (MPC) proposed minimize hence column analog-to-digital converter (ADC) precision. charge summing (QS) model associated IMC...

10.1145/3400302.3416344 article EN 2020-11-02

This paper presents signal processing methods to enhance the energy vs. accuracy trade-off of in-memory computing (IMC) architectures. First, an optimal clipping criterion (OCC) for quantization is proposed in order minimize precision column analog-to-digital converters (ADCs) at iso-accuracy. For a Gaussian distributed signal, OCC shown reduce ADC requirements by 3 bits signal-to-quantization noise ratio (SQNR) 22.5 dB over commonly used full range (FR) quantizer. Next, input-sliced...

10.1109/tsp.2021.3130488 article EN cc-by IEEE Transactions on Signal Processing 2021-01-01

New collective optical properties have emerged recently from organized and oriented arrays of closely packed semiconducting metallic nanoparticles (NPs). However, it is still challenging to obtain NP assemblies which are similar everywhere on a given sample and, most importantly, share unique common orientation that would guarantee behavior the sample. In this context, by combining microscopy, fluorescence microscopy synchrotron-based grazing incidence X-ray scattering (GISAXS) gold...

10.1039/d2sm00376g article EN Soft Matter 2022-01-01

It is well-known that the precision of data, weight vector, and internal representations employed in learning systems directly impacts their energy, throughput, latency. The requirements for training algorithm are also important learn on-the-fly. In this paper, we present analytical lower bounds on commonly stochastic gradient descent (SGD) on-line specific context a support vector machine (SVM). These obtained subject to desired system performance. validated using UCI breast cancer dataset....

10.1109/icassp.2017.7952334 article EN 2017-03-01

This article obtains fundamental limits on the computational precision of in-memory computing architectures (IMCs). An IMC noise model and associated signal-to-noise ratio (SNR) metrics are defined their interrelationships analyzed to show that accuracy IMCs is fundamentally limited by compute SNR ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula> ) its...

10.1109/tcad.2021.3124757 article EN cc-by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2021-11-02

Data clipping is crucial in reducing noise quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set threshold scalars cannot be shown optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm determine MSE-optimal scalars. Derived from fast Newton-Raphson method, OCTAV finds optimal fly, for every tensor, at iteration QAT routine. Thus, formulated with provably minimum each...

10.48550/arxiv.2206.06501 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Efforts to reduce the numerical precision of computations in deep learning training have yielded systems that aggressively quantize weights and activations, yet employ wide high-precision accumulators for partial sums inner-product operations preserve quality convergence. The absence any framework analyze requirements sum accumulations results conservative design choices. This imposes an upper-bound on reduction complexity multiply-accumulate units. We present a statistical approach impact...

10.48550/arxiv.1901.06588 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper presents a 0.34 uJ/decision deep learning-based classifier for keyword spotting (KWS) in 65 nm CMOS with all weights stored on-chip. work adapts Recurrent Attention Model (RAM) algorithm the KWS task, and employs an in-memory computing (IMC) architecture to achieve up 9× savings energy/decision more than 23× EDP of decisions over state-of-the art IMC IC using Google Speech dataset while achieving highest reported decision throughput 18.32 k decisions/s.

10.1109/cicc48029.2020.9075923 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2020-03-01

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set principal components. The activation-centrality the approach enables retraining LLMs with no loss expressivity; while at inference, weight decomposition is obtained as byproduct matrix multiplication associativity. Theoretical results construction projection matrices optimal...

10.48550/arxiv.2410.05437 preprint EN arXiv (Cornell University) 2024-10-07

In this work, we re-formulate the model compression problem into customized compensation problem: Given a compressed model, aim to introduce residual low-rank paths compensate for errors under requirements from users (e.g., tasks, ratios), resulting in greater flexibility adjusting overall capacity without being constrained by specific formats. However, naively applying SVD derive causes suboptimal utilization of representation capacity. Instead, propose Training-free Eigenspace Low-Rank...

10.48550/arxiv.2410.21271 preprint EN arXiv (Cornell University) 2024-10-28

Deep neural networks (DNNs) are powerful machine learning models but typically deployed in large computing clusters due to their high computational and parameter complexity. Many biomedical applications require embedded inference on resource-constrained platforms thus causing a challenge when considering the deployment of DNNs. One method address this is via reduced precision implementations. We use an analytical determine suitable minimum requirements DNNs show its application CHB-MIT EEG...

10.1109/biocas.2018.8584732 article EN 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS) 2018-10-01
Coming Soon ...