- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Advanced Neural Network Applications
- Semiconductor materials and devices
- Phase Change Materials Research
- Adsorption and Cooling Systems
- Solar Thermal and Photovoltaic Systems
- Adversarial Robustness in Machine Learning
- Parallel Computing and Optimization Techniques
- Low-power high-performance VLSI design
- Machine Learning and Data Classification
- VLSI and Analog Circuit Testing
- Neuroscience and Neural Engineering
- Imbalanced Data Classification Techniques
- Advancements in Semiconductor Devices and Circuit Design
- Advanced Data Storage Technologies
- Magnetic properties of thin films
- Solar-Powered Water Purification Methods
- Music and Audio Processing
- Anomaly Detection Techniques and Applications
- Heat Transfer and Optimization
- Explainable Artificial Intelligence (XAI)
- Data Stream Mining Techniques
- Music Technology and Sound Studies
- Speech and Audio Processing
IBM (United States)
2021-2025
IBM Research - Thomas J. Watson Research Center
2020-2024
Bhabha Hospital
2023
Indian Institute of Technology Delhi
2009-2023
Athlone Institute of Technology
2021-2023
Visa (United Kingdom)
2022
Visa (United States)
2022
Purdue University West Lafayette
2017-2021
Bharati Vidyapeeth Deemed University
2019
Motilal Nehru National Institute of Technology
2016
In this paper we propose a novel model for unconditional audio generation based on generating one sample at time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in hierarchical structure is able to capture underlying sources of variations the temporal sequences over very long time spans, three datasets different nature. Human evaluation generated samples indicate preferred...
In-memory computing is a promising approach to addressing the processor-memory data transfer bottleneck in systems. We propose spin-transfer torque compute-in-memory (STT-CiM), design for in-memory with magnetic RAM (STT-MRAM). The unique properties of spintronic memory allow multiple wordlines within an array be simultaneously enabled, opening up possibility directly sensing functions values stored rows using single access. modifications STT-MRAM peripheral circuits that leverage this...
Resistive crossbars designed with nonvolatile memory devices have emerged as promising building blocks for deep neural network (DNN) hardware, due to their ability compactly and efficiently realize vector-matrix multiplication (VMM), the dominant computational kernel in DNNs. However, a key challenge resistive is that they suffer from range of device circuit level nonidealities, such driver resistance, sensing sneak paths, interconnect parasitics, nonlinearities peripheral circuits,...
Traditional computing systems based on the von Neumann architecture are fundamentally bottlenecked by data transfers between processors and memory. The emergence of data-intensive workloads, such as machine learning (ML), creates an urgent need to address this bottleneck designing platforms that utilize principle colocated memory processing units. Such approach, known "in-memory computing," can potentially eliminate movement costs inside array itself. Crossbars resistive nonvolatile (NVM)...
Resistive crossbars have shown strong potential as the building blocks of future neural fabrics, due to their ability natively execute vector-matrix multiplication (the dominant computational kernel in DNNs). However, a key challenge that arises resistive is non-idealities synaptic devices, interconnects, and peripheral circuits lead errors computations performed. When large-scale DNNs are executed on crossbar systems, these compound result unacceptable degradation application-level...
Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...
The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...
Deep neural networks (DNNs) have gained tremendous popularity in recent years due to their ability achieve superhuman accuracy a wide variety of machine learning tasks. However, the compute and memory requirements DNNs grown rapidly, creating need for energy-efficient hardware. Resistive crossbars attracted significant interest design next generation DNN accelerators natively execute massively parallel vector-matrix multiplications within dense arrays. crossbar-based computations face major...
We introduce a highly heterogeneous and programmable compute-in-memory (CIM) accelerator architecture for deep neural network (DNN) inference. This combines spatially distributed CIM memory array “tiles” weight-stationary, energy-efficient multiply–accumulate (MAC) operations, together with special-function compute cores auxiliary digital computation. Massively parallel vectors of neuron activation data are exchanged over short distances using dense efficient circuit-switched 2-D mesh,...
The use of lower precision has emerged as a popular technique to optimize the compute and storage requirements complex deep neural networks (DNNs). In quest for precision, recent studies have shown that ternary DNNs (which represent weights activations by signed values) promising sweet spot, achieving accuracy close full-precision on tasks. We propose TiM-DNN, programmable in-memory accelerator is specifically designed execute DNNs. TiM-DNN supports various representations including...
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<16 bits) representation. However, quantization error inherent any...
Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range machine learning tasks. The enormous computational demand posed by DNNs is key challenge computing system designers and has most commonly been addressed through design DNN accelerators. However, these specialized accelerators utilize large quantities multiply-accumulate units on-chip memory are prohibitive in area cost constrained systems such wearable devices IoT sensors. In this work, we take...
Fixed-point implementations (FxP) are prominently used to realize Deep Neural Networks (DNNs) efficiently on energy-constrained platforms. The choice of bit-width is often constrained by the ability FxP represent entire range numbers in datastructure with sufficient resolution. At low bit-widths (< 8 bits), state-of-the-art DNNs invariably suffer a loss classification accuracy due quantization/saturation errors.
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<;16 bits) representation. However, quantization error inherent any...
Memory Augmented Neural Networks (MANNs) enhance a deep neural network with an external differentiable memory, enabling them to perform complex tasks well beyond the capabilities of conventional networks. We identify unique challenge that arises in MANNs due soft reads and writes each which requires access all memory locations. This characteristic MANN workloads severely limits performance on CPUs, GPUs, classical accelerators. present first effort design hardware architecture improves...
We propose a non-volatile memory based on cross-coupled reconfigurable ferroelectric transistors (R-FEFETs) which features differential read along with low power computation-in-memory (CiM). Exploiting the dynamic modulation of hysteresis in R-FEFETs, we achieve aforementioned functionalities just 2 access (in addition to R-FEFETs). The proposed not only enhances sense margin during read, but also enables natural computation AND and NOR logic functions between two bits stored array,...
Intrinsic application resilience, a property exhibited by many emerging domains, allows designers to optimize computing platforms approximating selected computations within an without any perceivable loss in its output quality. At the circuit level, this is often achieved designing circuits that are more efficient but realize slightly modified functionality. Most prior efforts on approximate design hardwire degree of approximation into implementation. This severely limits their...
In-memory computing is a promising approach to alleviating the processor-memory data transfer bottleneck in systems. While spintronics has attracted great interest as non-volatile memory technology, recent work shown that its unique properties can also enable in-memory computing. We summarize efforts this direction, and describe three different designs enhance STT-MRAM perform logic, arithmetic, vector operations evaluate transcendental functions within arrays.
The rapid emergence of AI models, specifically large language models (LLMs) requiring amounts compute, drives the need for dedicated inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers an model, number tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within system-level specifications discrete accelerator cards, including...