Jeff Zhang

ORCID: 0000-0001-7411-8923
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Embedded Systems Design Techniques
  • Advanced Neural Network Applications
  • Interconnection Networks and Systems
  • Offshore Engineering and Technologies
  • MRI in cancer diagnosis
  • Semiconductor materials and devices
  • GaN-based semiconductor devices and materials
  • Indoor and Outdoor Localization Technologies
  • Millimeter-Wave Propagation and Modeling
  • Silicon Carbide Semiconductor Technologies
  • Oil and Gas Production Techniques
  • CCD and CMOS Imaging Sensors
  • Photonic and Optical Devices
  • Advanced MRI Techniques and Applications
  • Advanced Data Storage Technologies
  • Neural Networks and Reservoir Computing
  • Radiation Effects in Electronics
  • Adversarial Robustness in Machine Learning
  • Recommender Systems and Techniques
  • Distributed and Parallel Computing Systems
  • Energy Harvesting in Wireless Networks
  • Drilling and Well Engineering

Arizona State University
2023-2025

Harvard University Press
2021-2024

ShanghaiTech University
2024

Institute of Wood Science and Technology
2020-2024

New York University
2016-2023

Harvard University
2020-2022

University at Buffalo, State University of New York
2022

Zhuhai Institute of Advanced Technology
2019-2021

Los Angeles Medical Center
2015

Kaiser Permanente
2015

Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with design of fault-tolerant, accelerators high defect rate technologies. To this end, we empirically show that classification accuracy baseline TPU drops significantly even extremely...

10.1109/vts.2018.8368656 preprint EN 2018-04-01

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.1145/3195970.3196129 article EN 2018-06-19

Editor's note: Systolic array is embracing its renaissance after being accepted by Google TPU as the core computing architecture of machine learning acceleration. In this article, authors propose two strategies to enhance fault tolerance systolic based deep neural network accelerators.

10.1109/mdat.2019.2915656 article EN IEEE Design and Test 2019-05-08

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.1109/dac.2018.8465918 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

Modern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run applications in parallel. This brings challenges managing how the access shared resources, e.g., memory hierarchy, communication channels, on-chip power. We address these through flexible orchestration data on 74Tbps network-on-chip (NoC) for dynamic management resources under contention distributed power (DHPM) scheme. Developing testing ideas requires comprehensive evaluation platform....

10.1109/isscc49657.2024.10454572 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Machine learning, in particular deep is being used almost all the aspects of life to facilitate humans, specifically mobile and Internet Things (IoT)-based applications. Due its state-of-the-art performance, learning also employed safety-critical applications, for instance, autonomous vehicles. Reliability security are two key required characteristics these applications because impact they can have on human's life. Towards this, this paper, we highlight current progress, challenges research...

10.1145/3316781.3323472 article EN 2019-05-23

Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...

10.1109/mm.2022.3178580 article EN IEEE Micro 2022-06-01

Large language models have substantially advanced nuance and context understanding in natural processing (NLP), further fueling the growth of intelligent conversational interfaces virtual assistants. However, their hefty computational memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency energy requirements. For example, an inference pass using state-of-the-art BERT-base model must serially traverse through 12 computationally intensive...

10.1109/isscc42615.2023.10067817 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2023-02-19

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and system loads. This paper presents RecPipe, a to jointly optimize quality inference performance. Central RecPipe is decomposing models into multi-stage pipelines maintain while reducing compute complexity exposing distinct parallelism opportunities. implements an scheduler map engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware...

10.1145/3466752.3480127 article EN 2021-10-17

The evolution of AI algorithms has not only revolutionized many application domains, but also posed tremendous challenges on the hardware platform. Advanced packaging technology today, such as 2.5D and 3D interconnection, provides a promising solution to meet ever-increasing demands bandwidth, data movement, system scale in computing. This work presents HISIM, modeling benchmarking tool for chiplet-based heterogeneous integration. HISIM emphasizes hierarchical interconnection that connects...

10.1109/asp-dac58780.2024.10473875 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

Although it is evident that zoster vaccination reduces postherpetic neuralgia (PHN) risk by reducing herpes (HZ) occurrence, less clear whether the vaccine protects against PHN among patients who develop HZ despite previous vaccination.This cohort study included immunocompetent with HZ. The vaccinated 1155 individuals were at age ≥60 years and had an episode after vaccination. Vaccinated matched 1:1 sex unvaccinated patients. Trained medical residents reviewed full record to determine...

10.1093/infdis/jiv244 article EN The Journal of Infectious Diseases 2015-06-01

Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...

10.1109/asap52443.2021.00029 article EN 2021-07-01

This paper presents an agile-designed domain-specific SoC in 12nm CMOS for the emerging application domain of swarm-based perception. Featuring a heterogeneous tile-based architecture, was designed with agile methodology using open-source processors and accelerators, interconnected by multi-plane NoC. A reconfigurable memory hierarchy CS-GALS clocking scheme allow to run at variety performance/power operating points. Compared high-end FPGA, presented achieves 7 × performance 62× efficiency...

10.1109/esscirc55480.2022.9911456 article EN ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC) 2022-09-19

Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase energy efficiency of DNN accelerators. Architectural exploration timing requires detailed gate-level simulations that can be time-consuming large DNNs which execute millions multiply-and-accumulate (MAC) operations. In this paper we propose FATE, new methodology fast and...

10.1145/3240765.3240809 article EN 2018-11-05

This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- off-chip memory accesses to large activation inputs (sometimes called feature maps) CNN layers contribute significantly total energy consumption such accelerators; while prior has proposed compression, activations are still stored on-chip in uncompressed form, requiring either buffers or slow energy-hungry accesses. In this paper, we propose...

10.1145/3358178 article EN ACM Transactions on Embedded Computing Systems 2019-10-07

The millimeter wave (mmWave) bands have attracted considerable attention for high precision localization applications due to the ability capture angular and temporal resolution measurements. This paper explores mmWave-based positioning a target problem where fixed broadcasts mmWave signals mobile robotic agent attempts locate navigate target. A three-stage procedure is proposed: First, uses tensor decomposition methods detect multipath channel components estimate their parameters. Second,...

10.1109/ojcoms.2022.3155572 article EN cc-by IEEE Open Journal of the Communications Society 2022-01-01

Brain-inspired hyperdimensional computing (HDC) is an emerging computational paradigm that has achieved success in various domains. HDC mimics brain cognition and lever-ages vectors with fully distributed holographic representation (pseudo)randomness. Compared to the traditional machine learning methods, offers several critical advantages, including smaller model size, less computation cost, one-shot capability, making it a promising candidate low-power platforms. Despite growing popularity...

10.1109/asap52443.2021.00039 article EN 2021-07-01

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...

10.48550/arxiv.1802.03806 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A novel strain engineering is reported to realize enhancement-mode high electron mobility transistors (HEMTs) with ultralow specific on-resistance ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{R}_{\mathbf{on},\mathbf{sp}}$</tex> ) fabricated on 200 mm CMOS-compatible process platform. In this scheme, a enhancement layer deposited the access region of HEMT by low cost CVD demonstrated reduce . As comparing 100 V-rated without...

10.1109/ispsd.2019.8757694 article EN 2019-05-01

Hardware-accelerated learning and inference algorithms are quite popular in edge devices where predictable timing behavior minimal energy consumption required, while maintaining robustness to errors. To achieve this, dynamic voltage scaling techniques have been utilized several accelerators. Therefore, this article presents Thundervolt, a framework allowing adaptive aggressive underscaling the (reliability, predictability, performance) of such

10.1109/mdat.2019.2947271 article EN IEEE Design and Test 2019-10-14
Coming Soon ...