NFDI4DS | UHH-SEMS - Publication Details

Pierce Chuang

ORCID: 0000-0001-5850-7048

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5069007377

Research Areas

Low-power high-performance VLSI design
Advanced Neural Network Applications
Advancements in Semiconductor Devices and Circuit Design
Analog and Mixed-Signal Circuit Design
Advanced Memory and Neural Computing
Natural Language Processing Techniques
Topic Modeling
Neural Networks and Applications
Quantum-Dot Cellular Automata
Semiconductor materials and devices
Adversarial Robustness in Machine Learning
Advancements in PLL and VCO Technologies
Speech Recognition and Synthesis
Domain Adaptation and Few-Shot Learning
Electromagnetic Compatibility and Noise Suppression
Parallel Computing and Optimization Techniques
Music and Audio Processing
Speech and Audio Processing
Text Readability and Simplification
Electrostatic Discharge in Electronics
Sparse and Compressive Sensing Techniques
Machine Learning and Data Classification
Genomics and Phylogenetic Studies
VLSI and Analog Circuit Testing
Machine Learning and ELM

META Health
2023-2024

Meta (Israel)
2021-2022

Meta (United States)
2020

IBM Research - Thomas J. Watson Research Center
2018

Alliance for Safe Kids
2018

IBM (United States)
2015-2018

University of Waterloo
2009-2015

PACT: Parameterized Clipping Activation for Quantized Neural Networks

OPENALEX - Publications

Jungwook Choi Zhuo Wang Swagath Venkataramani Pierce Chuang Vijayalakshmi Srinivasan and 1 more

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number quantization schemes have been proposed - but most these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes novel scheme for activations during training that enables neural networks work well with ultra low precision weights and without any degradation. technique, PArameterized...

10.48550/arxiv.1805.06085 preprint EN cc-by arXiv (Cornell University) 2018-01-01

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

OPENALEX - Publications

Bruce Fleischer Sunil Shukla Matthew M. Ziegler J. A. Silberman Jinwook Oh and 26 more

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...

10.1109/vlsic.2018.8502276 article EN 2018-06-01

A Low-Power High-Performance Single-Cycle Tree-Based 64-Bit Binary Comparator

OPENALEX - Publications

Pierce Chuang David Li Manoj Sachdev

A single-cycle 64-bit binary comparator utilizing a radix-2 tree structure is proposed in this brief. This novel architecture specifically designed for static logic to achieve both low-power and high-performance operation, particularly at low-input data activity environments. brief presents detailed performance power analysis of various state-of-the-art designs across three CMOS technologies. At 65-nm technology, with 25% (10%) activity, the design demonstrates 2.3 × (3.5 x) 3.7 (5.8...

10.1109/tcsii.2011.2180110 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2012-01-18

26.5 Adaptive clocking in the POWER9™ processor for voltage droop protection

OPENALEX - Publications

Michael S. Floyd P.J. Restle Michael Sperling Pawel Owczarczyk Eric Fluhr and 5 more

Increasing transistor counts in modern processors can create instantaneous changes current, driving nanosecond-speed supply voltage (V DD ) droops that require extra guardband for correct product operation. The POWER9 processor uses an adaptive clock strategy to reduce timing margin needed during power droop events by embedding analog voltage-droop monitors (VDMs) direct a digital...

10.1109/isscc.2017.7870452 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

Compensated-DNN

OPENALEX - Publications

Shubham Jain Swagath Venkataramani Vijayalakshmi Srinivasan Jungwook Choi Pierce Chuang and 1 more

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<16 bits) representation. However, quantization error inherent any...

10.1145/3195970.3196012 article EN 2018-06-19

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

OPENALEX - Publications

Jungwook Choi Pierce Chuang Zhuo Wang Swagath Venkataramani Vijayalakshmi Srinivasan and 1 more

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and others quantizing activations. This paper proposes novel techniques that target activation quantizations separately resulting in an overall quantized neural network (QNN). The technique, PArameterized Clipping acTivation (PACT), uses clipping...

10.48550/arxiv.1807.06964 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Constant Delay Logic Style

OPENALEX - Publications

Pierce Chuang David Li Manoj Sachdev

A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed applications. The CD characteristic of regardless the type makes it suitable implementing complicated expressions such as addition. exhibits a unique where output pre-evaluated before inputs from preceding stage ready. This feature offers performance advantage over static and dynamic domino styles single-cycle multistage circuit block. Several design considerations including timing window width...

10.1109/tvlsi.2012.2189423 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2012-03-13

A Scalable Multi-TeraOPS Core for AI Training and Inference

OPENALEX - Publications

Sunil Shukla Bruce Fleischer Matthew M. Ziegler J. A. Silberman Jinwook Oh and 26 more

This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing dataflow to provide high throughput an on-chip scratchpad hierarchy meet bandwidth demands compute units. A 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, 9 mantissa bits has also been developed model accuracy in inference...

10.1109/lssc.2019.2902738 article EN IEEE Solid-State Circuits Letters 2018-12-01

Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

OPENALEX - Publications

Akshat Shrivastava Pierce Chuang Arun Babu Shrey Desai Abhinav Arora and 2 more

An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances semantic frames proceeds in three steps: encoding an utterance x, predicting a frame’s length |y|, and decoding |y|-sized frame with ontology tokens. Though empirically strong, these models are typically bottlenecked by prediction, as even small inaccuracies change the syntactic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive which shift...

10.18653/v1/2021.findings-emnlp.161 preprint EN cc-by 2021-01-01

Comparative analysis and study of metastability on high-performance flip-flops

OPENALEX - Publications

David Li Pierce Chuang Manoj Sachdev

In this paper, we analyze and characterize the metastability of 11 previously proposed high-performance flip-flops, reduced clock-swing level-converting flip-flops. From extensive simulation results in 65nm CMOS technology, main parameters ¿ T 0 are extracted analyzed at both nominal supply voltage. Our indicate that these flip-flops exhibit a wide range (up to few orders magnitudes) windows....

10.1109/isqed.2010.5450482 article EN 2010-03-01

A 167-ps 2.34-mW Single-Cycle 64-Bit Binary Tree Comparator With Constant-Delay Logic in 65-nm CMOS

OPENALEX - Publications

Pierce Chuang Manoj Sachdev Vincent Gaudet

A single-cycle tree-based 64-bit binary comparator with constant-delay (CD) logic realized in a 65-nm, 1-V CMOS process is presented this paper. Unlike dynamic yet domino-compatible, CD predischarges the output to "0" and conditionally makes transition "1" through critical-path CLK PMOS transistors for an NMOS transistor network. The constant delay (regardless of fan-in) feature it up 2× faster than gate during D-Q mode complex such as two-bit comparator. proposed comparator's architecture...

10.1109/tcsi.2013.2268591 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2013-10-11

Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors

OPENALEX - Publications

Shubham Jain Swagath Venkataramani Vijayalakshmi Srinivasan Jungwook Choi Pierce Chuang and 1 more

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<;16 bits) representation. However, quantization error inherent any...

10.1109/dac.2018.8465893 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

Design and analysis of metastable-hardened and soft-error tolerant high-performance, low-power flip-flops

OPENALEX - Publications

David Li David J. Rennie Pierce Chuang David Nairn Manoj Sachdev

In this paper, detailed analysis is given on the design of metastable-hardened and soft-error tolerant flip-flops while maintaining basic characteristics low-power high-performance. We also propose two new flip-flop designs: pre-discharge (PDFF-SE) sense-amplifier transmission-gate (SATG-SE). Following our main approach, both PDFF-SE SATG-SE use a cross-coupled inverter critical path in master-stage to achieve good metastability generating differential signals facilitate usage Quatro cell...

10.1109/isqed.2011.5770787 article EN 2011-03-01

Droop mitigation using critical-path sensors and an on-chip distributed power supply estimation engine in the z14™ enterprise processor

OPENALEX - Publications

Christos Vezyrtzis T. Strach Pierce Chuang Preetham Lobo Richard Rizzolo and 11 more

Enterprise server processor designs, which operate at extreme high frequencies and power envelopes, depend critically on supply noise mitigation techniques. With voltage scaling, very current draws, broad usage of clock gating, advanced solutions are needed for next-generation products to minimize droop response time, can be defined as the latency from when a dangerous begins until countermeasure is effective.

10.1109/isscc.2018.8310303 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2018-02-01

The 24-Core POWER9 Processor With Adaptive Clocking, 25-Gb/s Accelerator Links, and 16-Gb/s PCIe Gen4

OPENALEX - Publications

Christopher González Michael S. Floyd Eric Fluhr P.J. Restle Daniel Dreps and 18 more

The POWER9TM family of chips is fabricated in 14-nm silicon-on-insulator finFET technology using 17 levels copper interconnect. 695-mm 2 24-core microprocessor features a new core based on an execution slice microarchitecture. chip contains 8 billion transistors and has 120 MB eDRAM L3 cache. processor adaptive clock strategy to reduce timing margin needed during power supply droop events by...

10.1109/jssc.2017.2748623 article EN IEEE Journal of Solid-State Circuits 2017-12-14

26.2 Power supply noise in a 22nm z13™ microprocessor

OPENALEX - Publications

Pierce Chuang Christos Vezyrtzis Divya Pathak Richard Rizzolo Tobias Webel and 10 more

Successful power supply noise mitigation requires a system-level approach that includes design and modeling of the circuits with delivery network (PDN) on chip, chip module, backplane, voltage regulator module (VRM). Traditionally, periodic square-wave activity patterns all cores in sync, which yield low-frequency (LF) or mid-frequency (MF) impedance peaks associated backplane chip/module, respectively, are considered to give rise worst case noise. However, droops both deeper faster at...

10.1109/isscc.2017.7870449 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

OPENALEX - Publications

Ashish Shenoy Yichao Lu Srihari Jayakumar Debojeet Chatterjee Mohsen Moslehpour and 9 more

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...

10.48550/arxiv.2402.08017 preprint EN arXiv (Cornell University) 2024-02-12

Design of a 64-bit low-energy high-performance adder using dynamic feedthrough logic

OPENALEX - Publications

Pierce Chuang David Li Manoj Sachdev

In this work, a new design approach in implementing low-energy, high-performance 64-bit adder using dynamic feedthrough logic (DFTL) is introduced and analyzed. Design issues of DFTL several depth are analyzed order to achieve the best optimal balance between performance power consumption. A ldquotiming windowrdquo technique also proposed reduce amount excessive dissipation approach. Sklansky carry-merge used as benchmark comparison different styles including DFTL, CDL, dynamic, static...

10.1109/iscas.2009.5118443 article EN 1993 IEEE International Symposium on Circuits and Systems 2009-05-01

Design and analysis of metastable-hardened flip-flops in sub-threshold region

OPENALEX - Publications

David Li Pierce Chuang David Nairn Manoj Sachdev

Flip-flop metastability is becoming an important consideration for designing reliable synchronous and asynchronous systems, especially in the sub-threshold region where it degrades exponentially with reduction supply voltage. In this paper, detailed analysis given on design of metastable-hardened flip-flops region. Proper transistor sizing using either transconductance or load variation along implementing inverter pair flip-flop master-stage low-V th can result significant time-resolving...

10.5555/2016802.2016844 article EN International Symposium on Low Power Electronics and Design 2011-08-01

IBM z14: Processor Characterization and Power Management for High-Reliability Mainframe Systems

OPENALEX - Publications

Christopher Berry David Wolpert Christos Vezrytzis Richard Rizzolo Seán Carey and 13 more

The IBM z14 is the latest update in storied history of mainframes. Reliability, availability, security, and scalability are foundation mainframe line. System reliability availability targets excess 10 years, requiring rigorous chip characterization processes. In this paper, we discuss some many processes used to ensure that lifetime. An additional part power management (PM). 5.2-GHz high-power design central processor requires advanced on-die PM capabilities adapt intensive instruction...

10.1109/jssc.2018.2873582 article EN IEEE Journal of Solid-State Circuits 2018-11-13

Omni-Sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR Via Supernet

OPENALEX - Publications

Haichuan Yang Yuan Shangguan Dilin Wang Meng Li Pierce Chuang and 4 more

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front model accuracy vs size, researchers are trapped in dilemma optimizing by training and fine-tuning for each individual device while keeping GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where single neural network can be pruned generate optimized large range sizes. We develop...

10.1109/icassp43922.2022.9746469 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Coming Soon ...