NFDI4DS | UHH-SEMS - Publication Details

Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator

OPENALEX - Publications

Jeff Zhang Tianyu Gu Kanad Basu Siddharth Garg

Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with design of fault-tolerant, accelerators high defect rate technologies. To this end, we empirically show that classification accuracy baseline TPU drops significantly even extremely...

10.1109/vts.2018.8368656 preprint EN 2018-04-01

Thundervolt

OPENALEX - Publications

Jeff Zhang Kartheek Rangineni Zahra Ghodsi Siddharth Garg

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.1145/3195970.3196129 article EN 2018-06-19

Fault-Tolerant Systolic Array Based Accelerators for Deep Neural Network Execution

OPENALEX - Publications

Jeff Zhang Kanad Basu Siddharth Garg

Editor's note: Systolic array is embracing its renaissance after being accepted by Google TPU as the core computing architecture of machine learning acceleration. In this article, authors propose two strategies to enhance fault tolerance systolic based deep neural network accelerators.

10.1109/mdat.2019.2915656 article EN IEEE Design and Test 2019-05-08

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators

OPENALEX - Publications

Jeff Zhang Kartheek Rangineni Zahra Ghodsi Siddharth Garg

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.1109/dac.2018.8465918 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

Building Robust Machine Learning Systems

OPENALEX - Publications

Jeff Zhang Kang Liu Faiq Khalid Muhammad Abdullah Hanif Semeen Rehman and 4 more

Machine learning, in particular deep is being used almost all the aspects of life to facilitate humans, specifically mobile and Internet Things (IoT)-based applications. Due its state-of-the-art performance, learning also employed safety-critical applications, for instance, autonomous vehicles. Reliability security are two key required characteristics these applications because impact they can have on human's life. Towards this, this paper, we highlight current progress, challenges research...

10.1145/3316781.3323472 article EN 2019-05-23

Bridging Python to Silicon: The SODA Toolchain

OPENALEX - Publications

Nicolas Bohm Agostini Serena Curzel Jeff Zhang Ankur Limaye Cheng Tan and 7 more

Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...

10.1109/mm.2022.3178580 article EN IEEE Micro 2022-06-01

22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management

OPENALEX - Publications

Thierry Tambe Jeff Zhang Coleman Hooper Tianyu Jia Paul N. Whatmough and 9 more

Large language models have substantially advanced nuance and context understanding in natural processing (NLP), further fueling the growth of intelligent conversational interfaces virtual assistants. However, their hefty computational memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency energy requirements. For example, an inference pass using state-of-the-art BERT-base model must serially traverse through 12 computationally intensive...

10.1109/isscc42615.2023.10067817 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2023-02-19

14.5 A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types, Distributed Hardware Power Management and Flexible NoC-Based Data Orchestration

OPENALEX - Publications

Maico Cassel dos Santos Tianyu Jia Joseph D. Zuckerman Martin Cochet Davide Giri and 24 more

Modern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run applications in parallel. This brings challenges managing how the access shared resources, e.g., memory hierarchy, communication channels, on-chip power. We address these through flexible orchestration data on 74Tbps network-on-chip (NoC) for dynamic management resources under contention distributed power (DHPM) scheme. Developing testing ideas requires comprehensive evaluation platform....

10.1109/isscc49657.2024.10454572 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

OPENALEX - Publications

Udit Gupta Samuel Hsia Jeff Zhang Mark Wilkening Javin Pombra and 4 more

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and system loads. This paper presents RecPipe, a to jointly optimize quality inference performance. Central RecPipe is decomposing models into multi-stage pipelines maintain while reducing compute complexity exposing distinct parallelism opportunities. implements an scheduler map engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware...

10.1145/3466752.3480127 article EN 2021-10-17

Exploiting 2.5D/3D Heterogeneous Integration for AI Computing

OPENALEX - Publications

Zhenyu Wang Jingbo Sun Alper Goksoy Sumit K. Mandal Yaotian Liu and 6 more

The evolution of AI algorithms has not only revolutionized many application domains, but also posed tremendous challenges on the hardware platform. Advanced packaging technology today, such as 2.5D and 3D interconnection, provides a promising solution to meet ever-increasing demands bandwidth, data movement, system scale in computing. This work presents HISIM, modeling benchmarking tool for chiplet-based heterogeneous integration. HISIM emphasizes hierarchical interconnection that connects...

10.1109/asp-dac58780.2024.10473875 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

HISIM: Analytical Performance Modeling and Design Space Exploration of 2.5D/3D Integration for AI Computing

OPENALEX - Publications

Zhenyu Wang Pragnya Sudershan Nalla Jingbo Sun A. Alper Goksoy Sumit K. Mandal and 6 more

10.1109/tcad.2025.3531348 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01

PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMs

OPENALEX - Publications

Jiaxiang Qin Tianhua Xia Cheng Tan Jeff Zhang Sai Qian Zhang

10.1145/3676641.3716013 article EN 2025-03-27

HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs

OPENALEX - Publications

Min Wu Huizhang Luo Fenfang Li Yiran Zhang Zhuo Tang and 3 more

10.1109/hpca61900.2025.00109 article EN 2025-03-01

Optimizing both performance and tail latency for B+tree on persistent memory

OPENALEX - Publications

Xianyu He Chaoshu Yang Runyu Zhang Huizhang Luo Zhi‐Chao Cao and 1 more

10.1016/j.sysarc.2025.103406 article EN Journal of Systems Architecture 2025-04-01

CLAIRE: Composable Chiplet Libraries for AI Inference

OPENALEX - Publications

Pragnya Sudershan Nalla Emad Haque Yaotian Liu Sachin S. Sapatnekar Jeff Zhang and 2 more

10.23919/date64628.2025.10992960 article EN 2025-03-31

Zoster Vaccine and the Risk of Postherpetic Neuralgia in Patients Who Developed Herpes Zoster Despite Having Received the Zoster Vaccine

OPENALEX - Publications

Hung Fu Tseng Bruno Lewin Craig M. Hales Lina S. Sy Rafael Harpaz and 10 more

Although it is evident that zoster vaccination reduces postherpetic neuralgia (PHN) risk by reducing herpes (HZ) occurrence, less clear whether the vaccine protects against PHN among patients who develop HZ despite previous vaccination.This cohort study included immunocompetent with HZ. The vaccinated 1155 individuals were at age ≥60 years and had an episode after vaccination. Vaccinated matched 1:1 sex unvaccinated patients. Trained medical residents reviewed full record to determine...

10.1093/infdis/jiv244 article EN The Journal of Infectious Diseases 2015-06-01

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays

OPENALEX - Publications

Cheng Tan Nicolas Bohm Agostini Jeff Zhang Marco Minutoli Vito Giovanni Castellana and 5 more

Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...

10.1109/asap52443.2021.00029 article EN 2021-07-01

A 12nm Agile-Designed SoC for Swarm-Based Perception with Heterogeneous IP Blocks, a Reconfigurable Memory Hierarchy, and an 800MHz Multi-Plane NoC

OPENALEX - Publications

Tianyu Jia Paolo Mantovani Maico Cassel dos Santos Davide Giri Joseph D. Zuckerman and 13 more

This paper presents an agile-designed domain-specific SoC in 12nm CMOS for the emerging application domain of swarm-based perception. Featuring a heterogeneous tile-based architecture, was designed with agile methodology using open-source processors and accelerators, interconnected by multi-plane NoC. A reconfigurable memory hierarchy CS-GALS clocking scheme allow to run at variety performance/power operating points. Compared high-end FPGA, presented achieves 7 × performance 62× efficiency...

10.1109/esscirc55480.2022.9911456 article EN ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC) 2022-09-19

FATE

OPENALEX - Publications

Jeff Zhang Siddharth Garg

Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase energy efficiency of DNN accelerators. Architectural exploration timing requires detailed gate-level simulations that can be time-consuming large DNNs which execute millions multiply-and-accumulate (MAC) operations. In this paper we propose FATE, new methodology fast and...

10.1145/3240765.3240809 article EN 2018-11-05

CompAct

OPENALEX - Publications

Jeff Zhang Parul Raj Shuayb Zarar Amol Ambardekar Siddharth Garg

This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- off-chip memory accesses to large activation inputs (sometimes called feature maps) CNN layers contribute significantly total energy consumption such accelerators; while prior has proposed compression, activations are still stored on-chip in uncompressed form, requiring either buffers or slow energy-hungry accesses. In this paper, we propose...

10.1145/3358178 article EN ACM Transactions on Embedded Computing Systems 2019-10-07

Millimeter Wave Wireless Assisted Robot Navigation With Link State Classification

OPENALEX - Publications

Mingsheng Yin Akshaj Kumar Veldanda Amee Trivedi Jeff Zhang Kai Pfeiffer and 5 more

The millimeter wave (mmWave) bands have attracted considerable attention for high precision localization applications due to the ability capture angular and temporal resolution measurements. This paper explores mmWave-based positioning a target problem where fixed broadcasts mmWave signals mobile robotic agent attempts locate navigate target. A three-stage procedure is proposed: First, uses tensor decomposition methods detect multipath channel components estimate their parameters. Second,...

10.1109/ojcoms.2022.3155572 article EN cc-by IEEE Open Journal of the Communications Society 2022-01-01

Assessing Robustness of Hyperdimensional Computing Against Errors in Associative Memory : (Invited Paper)

OPENALEX - Publications

Sizhe Zhang Ruixuan Wang Jeff Zhang Abbas Rahimi Xun Jiao

Brain-inspired hyperdimensional computing (HDC) is an emerging computational paradigm that has achieved success in various domains. HDC mimics brain cognition and lever-ages vectors with fully distributed holographic representation (pseudo)randomness. Compared to the traditional machine learning methods, offers several critical advantages, including smaller model size, less computation cost, one-shot capability, making it a promising candidate low-power platforms. Despite growing popularity...

10.1109/asap52443.2021.00039 article EN 2021-07-01

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators

OPENALEX - Publications

Jeff Zhang Kartheek Rangineni Zahra Ghodsi Siddharth Garg

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.48550/arxiv.1802.03806 preprint EN other-oa arXiv (Cornell University) 2018-01-01

High Performance GaN-on-Si Power Devices with Ultralow Specific On-resistance Using Novel Strain Method Fabricated on 200 mm CMOS-Compatible Process Platform

OPENALEX - Publications

King‐Yuen Wong Hao‐Chieh Chiu Jiapei Zhang Chunhua Zhou Thomas Zhao and 12 more

A novel strain engineering is reported to realize enhancement-mode high electron mobility transistors (HEMTs) with ultralow specific on-resistance ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{R}_{\mathbf{on},\mathbf{sp}}$</tex> ) fabricated on 200 mm CMOS-compatible process platform. In this scheme, a enhancement layer deposited the access region of HEMT by low cost CVD demonstrated reduce . As comparing 100 V-rated without...

10.1109/ispsd.2019.8757694 article EN 2019-05-01

Enabling Timing Error Resilience for Low-Power Systolic-Array Based Deep Learning Accelerators

OPENALEX - Publications

Jeff Zhang Zahra Ghodsi Siddharth Garg Kartheek Rangineni

Hardware-accelerated learning and inference algorithms are quite popular in edge devices where predictable timing behavior minimal energy consumption required, while maintaining robustness to errors. To achieve this, dynamic voltage scaling techniques have been utilized several accelerators. Therefore, this article presents Thundervolt, a framework allowing adaptive aggressive underscaling the (reliability, predictability, performance) of such

10.1109/mdat.2019.2947271 article EN IEEE Design and Test 2019-10-14