NFDI4DS | UHH-SEMS - Publication Details

PACT: Parameterized Clipping Activation for Quantized Neural Networks

OPENALEX - Publications

Jungwook Choi Zhuo Wang Swagath Venkataramani Pierce Chuang Vijayalakshmi Srinivasan and 1 more

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number quantization schemes have been proposed - but most these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes novel scheme for activations during training that enables neural networks work well with ultra low precision weights and without any degradation. technique, PArameterized...

10.48550/arxiv.1805.06085 preprint EN cc-by arXiv (Cornell University) 2018-01-01

SALSA

OPENALEX - Publications

Swagath Venkataramani Amit Sabne Vivek Kozhikkottu Kaushik Roy Anand Raghunathan

Approximate computing has emerged as a new design paradigm that exploits the inherent error resilience of wide range application domains by allowing hardware implementations to forsake exact Boolean equivalence with algorithmic specifications. A slew manual techniques for approximate have been proposed in recent years, but very little effort devoted automation.

10.1145/2228360.2228504 article EN 2012-05-31

AxNN

OPENALEX - Publications

Swagath Venkataramani Ashish Ranjan Kaushik Roy Anand Raghunathan

Neuromorphic algorithms, which are comprised of highly complex, large-scale networks artificial neurons, increasingly used for a variety recognition, classification, search and vision tasks. However, their computational energy requirements can be quite high, hence energy-efficient implementation is great interest.

10.1145/2627369.2627613 article EN Proceedings of the International Symposium on Low Power Electronics and Design 2014-08-01

Quality programmable vector processors for approximate computing

OPENALEX - Publications

Swagath Venkataramani Vinay K. Chippa Srimat Chakradhar Kaushik Roy Anand Raghunathan

Approximate computing leverages the intrinsic resilience of applications to inexactness in their computations, achieve a desirable trade-off between efficiency (performance or energy) and acceptable quality results. To broaden applicability approximate computing, we propose programmable processors, which notion is explicitly codified HW/SW interface, i.e., instruction set. The ISA processor contains instructions associated with fields specify accuracy level that must be met during execution....

10.1145/2540708.2540710 article EN 2013-12-07

Approximate computing and the quest for computing efficiency

OPENALEX - Publications

Swagath Venkataramani Srimat Chakradhar Kaushik Roy Anand Raghunathan

Diminishing benefits from technology scaling have pushed designers to look for new sources of computing efficiency. Multicores and heterogeneous accelerator-based architectures are a by-product this quest obtain improvements in the performance platforms at similar or lower power budgets. In light need innovations sustain these improvements, we discuss approximate computing, field that has attracted considerable interest over last decade. While core principles computing---computing...

10.1145/2744769.2751163 article EN 2015-06-02

ScaleDeep

OPENALEX - Publications

Swagath Venkataramani Ashish Ranjan Subarno Banerjee Dipankar Das Sasikanth Avancha and 6 more

Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity networks amount data they process, both which projected grow future. To improve efficiency DNNs, we propose ScaleDeep, dense, scalable server architecture, whose memory interconnect subsystems...

10.1145/3079856.3080244 article EN 2017-06-15

Substitute-and-Simplify: A Unified Design Paradigm for Approximate and Quality Configurable Circuits

OPENALEX - Publications

Swagath Venkataramani Kaushik Roy Anand Raghunathan

Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience realize hardware implementations highly efficient energy performance. In work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach the and synthesis of approximate circuits. The key insight behind SASIMI identify signal pairs assume same value with high probability, substitute...

10.7873/date.2013.280 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2013-01-01

Substitute-and-simplify: a unified design paradigm for approximate and quality configurable circuits

OPENALEX - Publications

Swagath Venkataramani Kaushik Roy Anand Raghunathan

Many applications are inherently resilient to inexactness or approximations in their underlying computations. Approximate circuit design is an emerging paradigm that exploits this inherent resilience realize hardware implementations highly efficient energy performance. In work, we propose Substitute-And-SIMplIfy (SASIMI), a new systematic approach the and synthesis of approximate circuits. The key insight behind SASIMI identify signal pairs assume same value with high probability, substitute...

10.5555/2485288.2485615 article EN 2013-03-18

Scalable-effort classifiers for energy-efficient machine learning

OPENALEX - Publications

Swagath Venkataramani Anand Raghunathan Jie Liu Mohammed Shoaib

Supervised machine-learning algorithms are used to solve classification problems across the entire spectrum of computing platforms, from data centers wearable devices, and place significant demand on their computational capabilities. In this paper, we propose scalable-effort classifiers, a new approach optimizing energy efficiency supervised classifiers. We observe that inherent difficulty varies widely inputs in real-world datasets; only small fraction truly require full effort classifier,...

10.1145/2744769.2744904 article EN 2015-06-02

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

OPENALEX - Publications

Bruce Fleischer Sunil Shukla Matthew M. Ziegler J. A. Silberman Jinwook Oh and 26 more

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...

10.1109/vlsic.2018.8502276 article EN 2018-06-01

9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

OPENALEX - Publications

Ankur Agrawal Sae Kyu Lee J. A. Silberman Matthew M. Ziegler Mingu Kang and 39 more

Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...

10.1109/isscc42613.2021.9365791 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

ASLAN: synthesis of approximate sequential circuits

OPENALEX - Publications

Ashish Ranjan Arnab Raha Swagath Venkataramani Kaushik Roy Anand Raghunathan

Many applications produce acceptable results when their underlying computations are executed in an approximate manner. For such applications, circuits enable hardware implementations that exhibit improved efficiency for a given quality. Previous efforts have largely focused on the design of combinational logic blocks as adders and multipliers. In practice, however, designers concerned with quality outputs generated by sequential circuit after several cycles computation, rather than embedded...

10.5555/2616606.2617119 article EN Design, Automation, and Test in Europe 2014-03-24

Approximate storage for energy efficient spintronic memories

OPENALEX - Publications

Ashish Ranjan Swagath Venkataramani Xuanyao Fong Kaushik Roy Anand Raghunathan

Spintronic memories are promising candidates for future on-chip storage due to their high density, non-volatility and near-zero leakage. However, the energy consumed by read write operations presents a major challenge use as energy-efficient memory. Leveraging ability of many applications tolerate impreciseness in underlying computations data, we explore approximate new approach improving energy-efficiency spintronic memories. We identify characterize mechanisms STT-MRAM bit-cells that...

10.1145/2744769.2744799 article EN 2015-06-02

Energy-Efficient Neural Computing with Approximate Multipliers

OPENALEX - Publications

Syed Shakib Sarwar Swagath Venkataramani Aayush Ankit Anand Raghunathan Kaushik Roy

Neural networks, with their remarkable ability to derive meaning from a large volume of complicated or imprecise data, can be used extract patterns and detect trends that are too complex for the von Neumann computing paradigm. Their considerable computational requirements stretch capabilities even modern platforms. We propose an approximate multiplier exploits inherent application resilience error utilizes notion computation sharing achieve improved energy consumption neural networks. also...

10.1145/3097264 article EN ACM Journal on Emerging Technologies in Computing Systems 2018-04-30

Efficient AI System Design With Cross-Layer Approximate Computing

OPENALEX - Publications

Swagath Venkataramani Xiao Sun Naigang Wang Chia‐Yu Chen Jungwook Choi and 35 more

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...

10.1109/jproc.2020.3029453 article EN Proceedings of the IEEE 2020-11-10

RaPiD: AI Accelerator for Ultra-low Precision Training and Inference

OPENALEX - Publications

Swagath Venkataramani Vijayalakshmi Srinivasan Wei Wang Sanchari Sen Jintao Zhang and 49 more

The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...

10.1109/isca52012.2021.00021 article EN 2021-06-01

Approximate computing: An integrated hardware approach

OPENALEX - Publications

Vinay K. Chippa Swagath Venkataramani Srimat Chakradhar Kaushik Roy Anand Raghunathan

Computing today is largely not about calculating a precise numerical end result. Instead, computing platforms are increasingly used to execute applications (such as search, analytics, sensor data processing, recognition, mining, and synthesis) for which "correctness" defined producing results that good enough, or of sufficient quality. These often intrinsically resilient large fraction their computations being executed in an imprecise approximate manner. However, the design continues be...

10.1109/acssc.2013.6810241 article EN Asilomar Conference on Signals, Systems and Computers 2013-11-01

Multiplier-less Artificial Neurons Exploiting Error Resiliency for Energy-Efficient Neural Computing

OPENALEX - Publications

Syed Shakib Sarwar Swagath Venkataramani Anand Raghunathan Kaushik Roy

Large-scale artificial neural networks have shown significant promise in addressing a wide range of classification and recognition applications. However, their large computational requirements stretch the capabilities computing platforms. The fundamental components these are neurons its synapses. core digital hardware neuron consists multiplier, accumulator activation function. Multipliers consume most processing energy neurons, thereby implementations networks. We propose an approximate...

10.3850/9783981537079_0848 article EN 2016-01-01

Exploiting approximate computing for deep learning acceleration

OPENALEX - Publications

Chia‐Yu Chen Jungwook Choi Kailash Gopalakrishnan V. Srinivasan Swagath Venkataramani

Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such image/video processing, natural language speech synthesis recognition, genomics many others embraced deep learning the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s MBs data storage, ExaOps computation high bandwidth movement. Despite advances...

10.23919/date.2018.8342119 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

STAG

OPENALEX - Publications

Rangharajan Venkatesan Shankar Ganesh Ramasubramanian Swagath Venkataramani Kaushik Roy Anand Raghunathan

General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds thousands of cores that current GPGPUs integrate places great demands on memory hierarchy, fueling an ever-increasing demand on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using new spintronic technology called Domain Wall Memory (DWM). DWMs inherently offer...

10.1145/2678373.2665710 article EN ACM SIGARCH Computer Architecture News 2014-06-14

ASLAN: Synthesis of approximate sequential circuits

OPENALEX - Publications

Ashish Ranjan Arnab Raha Swagath Venkataramani Kaushik Roy Anand Raghunathan

Many applications produce acceptable results when their underlying computations are executed in an approximate manner. For such applications, circuits enable hardware implementations that exhibit improved efficiency for a given quality. Previous efforts have largely focused on the design of combinational logic blocks as adders and multipliers. In practice, however, designers concerned with quality outputs generated by sequential circuit after several cycles computation, rather than embedded...

10.7873/date.2014.377 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2014-01-01

Approximate computing for spiking neural networks

OPENALEX - Publications

Sanchari Sen Swagath Venkataramani Anand Raghunathan

Spiking Neural Networks (SNNs) are widely regarded as the third generation of artificial neural networks, and expected to drive new classes recognition, data analytics computer vision applications. However, large-scale SNNs (e.g., scale human visual cortex) highly compute intensive, requiring approaches improve their efficiency. Complementary prior efforts that focus on parallel software design specialized hardware, we propose AxSNN, first effort apply approximate computing computational...

10.23919/date.2017.7926981 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2017-03-01