Radhika Jain
- Parallel Computing and Optimization Techniques
- Ferroelectric and Negative Capacitance Devices
- Advancements in Semiconductor Devices and Circuit Design
- Advanced Neural Network Applications
- VLSI and Analog Circuit Testing
- Low-power high-performance VLSI design
IBM Research - Thomas J. Watson Research Center
2021-2024
IBM (United States)
2021-2024
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands training inference. The leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency 8-bit floating-point (FP8) INT4 inference without...
The rapid emergence of AI models, specifically large language models (LLMs) requiring amounts compute, drives the need for dedicated inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers an model, number tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within system-level specifications discrete accelerator cards, including...