- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Parallel Computing and Optimization Techniques
- Low-power high-performance VLSI design
- Ferroelectric and Negative Capacitance Devices
- Adversarial Robustness in Machine Learning
- Advancements in Semiconductor Devices and Circuit Design
- Neural Networks and Applications
- Stochastic Gradient Optimization Techniques
- VLSI and Analog Circuit Testing
- Domain Adaptation and Few-Shot Learning
- Radiation Effects in Electronics
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
- Error Correcting Code Techniques
- CCD and CMOS Imaging Sensors
- Semiconductor materials and devices
- Neural dynamics and brain function
- IoT and Edge/Fog Computing
- Scientific Computing and Data Management
- Visual Attention and Saliency Detection
- Speech and Audio Processing
- Anomaly Detection Techniques and Applications
- VLSI and FPGA Design Techniques
- COVID-19 diagnosis using AI
IBM (United States)
2017-2024
IBM Research - Thomas J. Watson Research Center
2018-2024
Alliance for Safe Kids
2018-2022
Purdue University West Lafayette
2012-2019
Seoul National University
2019
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number quantization schemes have been proposed - but most these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes novel scheme for activations during training that enables neural networks work well with ultra low precision weights and without any degradation. technique, PArameterized...
Approximate computing has emerged as a new design paradigm that exploits the inherent error resilience of wide range application domains by allowing hardware implementations to forsake exact Boolean equivalence with algorithmic specifications. A slew manual techniques for approximate have been proposed in recent years, but very little effort devoted automation.
Neuromorphic algorithms, which are comprised of highly complex, large-scale networks artificial neurons, increasingly used for a variety recognition, classification, search and vision tasks. However, their computational energy requirements can be quite high, hence energy-efficient implementation is great interest.
Approximate computing leverages the intrinsic resilience of applications to inexactness in their computations, achieve a desirable trade-off between efficiency (performance or energy) and acceptable quality results. To broaden applicability approximate computing, we propose programmable processors, which notion is explicitly codified HW/SW interface, i.e., instruction set. The ISA processor contains instructions associated with fields specify accuracy level that must be met during execution....
Diminishing benefits from technology scaling have pushed designers to look for new sources of computing efficiency. Multicores and heterogeneous accelerator-based architectures are a by-product this quest obtain improvements in the performance platforms at similar or lower power budgets. In light need innovations sustain these improvements, we discuss approximate computing, field that has attracted considerable interest over last decade. While core principles computing---computing...
Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity networks amount data they process, both which projected grow future. To improve efficiency DNNs, we propose ScaleDeep, dense, scalable server architecture, whose memory interconnect subsystems...
Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience realize hardware implementations highly efficient energy performance. In work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach the and synthesis of approximate circuits. The key insight behind SASIMI identify signal pairs assume same value with high probability, substitute...
Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience realize hardware implementations highly efficient energy performance. In work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach the and synthesis of approximate circuits. The key insight behind SASIMI identify signal pairs assume same value with high probability, substitute...
Supervised machine-learning algorithms are used to solve classification problems across the entire spectrum of computing platforms, from data centers wearable devices, and place significant demand on their computational capabilities. In this paper, we propose scalable-effort classifiers, a new approach optimizing energy efficiency supervised classifiers. We observe that inherent difficulty varies widely inputs in real-world datasets; only small fraction truly require full effort classifier,...
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...
Many applications produce acceptable results when their underlying computations are executed in an approximate manner. For such applications, circuits enable hardware implementations that exhibit improved efficiency for a given quality. Previous efforts have largely focused on the design of combinational logic blocks as adders and multipliers. In practice, however, designers concerned with quality outputs generated by sequential circuit after several cycles computation, rather than embedded...
Spintronic memories are promising candidates for future on-chip storage due to their high density, non-volatility and near-zero leakage. However, the energy consumed by read write operations presents a major challenge use as energy-efficient memory. Leveraging ability of many applications tolerate impreciseness in underlying computations data, we explore approximate new approach improving energy-efficiency spintronic memories. We identify characterize mechanisms STT-MRAM bit-cells that...
Neural networks, with their remarkable ability to derive meaning from a large volume of complicated or imprecise data, can be used extract patterns and detect trends that are too complex for the von Neumann computing paradigm. Their considerable computational requirements stretch capabilities even modern platforms. We propose an approximate multiplier exploits inherent application resilience error utilizes notion computation sharing achieve improved energy consumption neural networks. also...
Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...
The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...
Computing today is largely not about calculating a precise numerical end result. Instead, computing platforms are increasingly used to execute applications (such as search, analytics, sensor data processing, recognition, mining, and synthesis) for which "correctness" defined producing results that good enough, or of sufficient quality. These often intrinsically resilient large fraction their computations being executed in an imprecise approximate manner. However, the design continues be...
Large-scale artificial neural networks have shown significant promise in addressing a wide range of classification and recognition applications. However, their large computational requirements stretch the capabilities computing platforms. The fundamental components these are neurons its synapses. core digital hardware neuron consists multiplier, accumulator activation function. Multipliers consume most processing energy neurons, thereby implementations networks. We propose an approximate...
Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such image/video processing, natural language speech synthesis recognition, genomics many others embraced deep learning the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s MBs data storage, ExaOps computation high bandwidth movement. Despite advances...
General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds thousands of cores that current GPGPUs integrate places great demands on memory hierarchy, fueling an ever-increasing demand on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using new spintronic technology called Domain Wall Memory (DWM). DWMs inherently offer...
Many applications produce acceptable results when their underlying computations are executed in an approximate manner. For such applications, circuits enable hardware implementations that exhibit improved efficiency for a given quality. Previous efforts have largely focused on the design of combinational logic blocks as adders and multipliers. In practice, however, designers concerned with quality outputs generated by sequential circuit after several cycles computation, rather than embedded...
Spiking Neural Networks (SNNs) are widely regarded as the third generation of artificial neural networks, and expected to drive new classes recognition, data analytics computer vision applications. However, large-scale SNNs (e.g., scale human visual cortex) highly compute intensive, requiring approaches improve their efficiency. Complementary prior efforts that focus on parallel software design specialized hardware, we propose AxSNN, first effort apply approximate computing computational...