- Parallel Computing and Optimization Techniques
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Embedded Systems Design Techniques
- Advanced Neural Network Applications
- Interconnection Networks and Systems
- Offshore Engineering and Technologies
- MRI in cancer diagnosis
- Semiconductor materials and devices
- GaN-based semiconductor devices and materials
- Indoor and Outdoor Localization Technologies
- Millimeter-Wave Propagation and Modeling
- Silicon Carbide Semiconductor Technologies
- Oil and Gas Production Techniques
- CCD and CMOS Imaging Sensors
- Photonic and Optical Devices
- Advanced MRI Techniques and Applications
- Advanced Data Storage Technologies
- Neural Networks and Reservoir Computing
- Radiation Effects in Electronics
- Adversarial Robustness in Machine Learning
- Recommender Systems and Techniques
- Distributed and Parallel Computing Systems
- Energy Harvesting in Wireless Networks
- Drilling and Well Engineering
Arizona State University
2023-2025
Harvard University Press
2021-2024
ShanghaiTech University
2024
Institute of Wood Science and Technology
2020-2024
New York University
2016-2023
Harvard University
2020-2022
University at Buffalo, State University of New York
2022
Zhuhai Institute of Advanced Technology
2019-2021
Los Angeles Medical Center
2015
Kaiser Permanente
2015
Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with design of fault-tolerant, accelerators high defect rate technologies. To this end, we empirically show that classification accuracy baseline TPU drops significantly even extremely...
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...
Editor's note: Systolic array is embracing its renaissance after being accepted by Google TPU as the core computing architecture of machine learning acceleration. In this article, authors propose two strategies to enhance fault tolerance systolic based deep neural network accelerators.
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...
Modern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run applications in parallel. This brings challenges managing how the access shared resources, e.g., memory hierarchy, communication channels, on-chip power. We address these through flexible orchestration data on 74Tbps network-on-chip (NoC) for dynamic management resources under contention distributed power (DHPM) scheme. Developing testing ideas requires comprehensive evaluation platform....
Machine learning, in particular deep is being used almost all the aspects of life to facilitate humans, specifically mobile and Internet Things (IoT)-based applications. Due its state-of-the-art performance, learning also employed safety-critical applications, for instance, autonomous vehicles. Reliability security are two key required characteristics these applications because impact they can have on human's life. Towards this, this paper, we highlight current progress, challenges research...
Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while meeting strict size power requirements. However, the algorithms applications need to be accelerated are evolving at rate is incompatible with manual design processes based on hardware description languages. Agile tools compiler techniques help by quickly producing an integrated circuit (ASIC)...
Large language models have substantially advanced nuance and context understanding in natural processing (NLP), further fueling the growth of intelligent conversational interfaces virtual assistants. However, their hefty computational memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency energy requirements. For example, an inference pass using state-of-the-art BERT-base model must serially traverse through 12 computationally intensive...
Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and system loads. This paper presents RecPipe, a to jointly optimize quality inference performance. Central RecPipe is decomposing models into multi-stage pipelines maintain while reducing compute complexity exposing distinct parallelism opportunities. implements an scheduler map engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware...
The evolution of AI algorithms has not only revolutionized many application domains, but also posed tremendous challenges on the hardware platform. Advanced packaging technology today, such as 2.5D and 3D interconnection, provides a promising solution to meet ever-increasing demands bandwidth, data movement, system scale in computing. This work presents HISIM, modeling benchmarking tool for chiplet-based heterogeneous integration. HISIM emphasizes hierarchical interconnection that connects...
Although it is evident that zoster vaccination reduces postherpetic neuralgia (PHN) risk by reducing herpes (HZ) occurrence, less clear whether the vaccine protects against PHN among patients who develop HZ despite previous vaccination.This cohort study included immunocompetent with HZ. The vaccinated 1155 individuals were at age ≥60 years and had an episode after vaccination. Vaccinated matched 1:1 sex unvaccinated patients. Trained medical residents reviewed full record to determine...
Reconfigurable architectures are today experiencing a renewed interest for their ability to provide specialization without sacrificing the capability adapt disparate workloads. Coarse-grained reconfigurable arrays (CGRAs) higher flexibility than application-specific integrated circuits (ASICs) while offering increased hardware efficiency with respect field-programmable gate (FPGAs). This makes CGRAs promising alternative enable power-/area-efficient acceleration across different application...
This paper presents an agile-designed domain-specific SoC in 12nm CMOS for the emerging application domain of swarm-based perception. Featuring a heterogeneous tile-based architecture, was designed with agile methodology using open-source processors and accelerators, interconnected by multi-plane NoC. A reconfigurable memory hierarchy CS-GALS clocking scheme allow to run at variety performance/power operating points. Compared high-end FPGA, presented achieves 7 × performance 62× efficiency...
Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase energy efficiency of DNN accelerators. Architectural exploration timing requires detailed gate-level simulations that can be time-consuming large DNNs which execute millions multiply-and-accumulate (MAC) operations. In this paper we propose FATE, new methodology fast and...
This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- off-chip memory accesses to large activation inputs (sometimes called feature maps) CNN layers contribute significantly total energy consumption such accelerators; while prior has proposed compression, activations are still stored on-chip in uncompressed form, requiring either buffers or slow energy-hungry accesses. In this paper, we propose...
The millimeter wave (mmWave) bands have attracted considerable attention for high precision localization applications due to the ability capture angular and temporal resolution measurements. This paper explores mmWave-based positioning a target problem where fixed broadcasts mmWave signals mobile robotic agent attempts locate navigate target. A three-stage procedure is proposed: First, uses tensor decomposition methods detect multipath channel components estimate their parameters. Second,...
Brain-inspired hyperdimensional computing (HDC) is an emerging computational paradigm that has achieved success in various domains. HDC mimics brain cognition and lever-ages vectors with fully distributed holographic representation (pseudo)randomness. Compared to the traditional machine learning methods, offers several critical advantages, including smaller model size, less computation cost, one-shot capability, making it a promising candidate low-power platforms. Despite growing popularity...
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling high-performance DNN without compromising classification accuracy even in presence high timing error rates. Using post-synthesis simulations accelerator modeled on Google TPU, show Thundervolt between 34%-57% savings state-of-the-art speech image...
A novel strain engineering is reported to realize enhancement-mode high electron mobility transistors (HEMTs) with ultralow specific on-resistance ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{R}_{\mathbf{on},\mathbf{sp}}$</tex> ) fabricated on 200 mm CMOS-compatible process platform. In this scheme, a enhancement layer deposited the access region of HEMT by low cost CVD demonstrated reduce . As comparing 100 V-rated without...
Hardware-accelerated learning and inference algorithms are quite popular in edge devices where predictable timing behavior minimal energy consumption required, while maintaining robustness to errors. To achieve this, dynamic voltage scaling techniques have been utilized several accelerators. Therefore, this article presents Thundervolt, a framework allowing adaptive aggressive underscaling the (reliability, predictability, performance) of such