Pierce Chuang

ORCID: 0000-0001-5850-7048
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Low-power high-performance VLSI design
  • Advanced Neural Network Applications
  • Advancements in Semiconductor Devices and Circuit Design
  • Analog and Mixed-Signal Circuit Design
  • Advanced Memory and Neural Computing
  • Natural Language Processing Techniques
  • Topic Modeling
  • Neural Networks and Applications
  • Quantum-Dot Cellular Automata
  • Semiconductor materials and devices
  • Adversarial Robustness in Machine Learning
  • Advancements in PLL and VCO Technologies
  • Speech Recognition and Synthesis
  • Domain Adaptation and Few-Shot Learning
  • Electromagnetic Compatibility and Noise Suppression
  • Parallel Computing and Optimization Techniques
  • Music and Audio Processing
  • Speech and Audio Processing
  • Text Readability and Simplification
  • Electrostatic Discharge in Electronics
  • Sparse and Compressive Sensing Techniques
  • Machine Learning and Data Classification
  • Genomics and Phylogenetic Studies
  • VLSI and Analog Circuit Testing
  • Machine Learning and ELM

META Health
2023-2024

Meta (Israel)
2021-2022

Meta (United States)
2020

IBM Research - Thomas J. Watson Research Center
2018

Alliance for Safe Kids
2018

IBM (United States)
2015-2018

University of Waterloo
2009-2015

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number quantization schemes have been proposed - but most these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes novel scheme for activations during training that enables neural networks work well with ultra low precision weights and without any degradation. technique, PArameterized...

10.48550/arxiv.1805.06085 preprint EN cc-by arXiv (Cornell University) 2018-01-01

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...

10.1109/vlsic.2018.8502276 article EN 2018-06-01

A single-cycle 64-bit binary comparator utilizing a radix-2 tree structure is proposed in this brief. This novel architecture specifically designed for static logic to achieve both low-power and high-performance operation, particularly at low-input data activity environments. brief presents detailed performance power analysis of various state-of-the-art designs across three CMOS technologies. At 65-nm technology, with 25% (10%) activity, the design demonstrates 2.3 × (3.5 x) 3.7 (5.8...

10.1109/tcsii.2011.2180110 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2012-01-18

Increasing transistor counts in modern processors can create instantaneous changes current, driving nanosecond-speed supply voltage (V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DD</sub> ) droops that require extra guardband for correct product operation. The POWER9 processor uses an adaptive clock strategy to reduce timing margin needed during power droop events by embedding analog voltage-droop monitors (VDMs) direct a digital...

10.1109/isscc.2017.7870452 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<16 bits) representation. However, quantization error inherent any...

10.1145/3195970.3196012 article EN 2018-06-19

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and others quantizing activations. This paper proposes novel techniques that target activation quantizations separately resulting in an overall quantized neural network (QNN). The technique, PArameterized Clipping acTivation (PACT), uses clipping...

10.48550/arxiv.1807.06964 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A constant delay (CD) logic style is proposed in this paper, targeting at full-custom high-speed applications. The CD characteristic of regardless the type makes it suitable implementing complicated expressions such as addition. exhibits a unique where output pre-evaluated before inputs from preceding stage ready. This feature offers performance advantage over static and dynamic domino styles single-cycle multistage circuit block. Several design considerations including timing window width...

10.1109/tvlsi.2012.2189423 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2012-03-13

This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing dataflow to provide high throughput an on-chip scratchpad hierarchy meet bandwidth demands compute units. A 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, 9 mantissa bits has also been developed model accuracy in inference...

10.1109/lssc.2019.2902738 article EN IEEE Solid-State Circuits Letters 2018-12-01

An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances semantic frames proceeds in three steps: encoding an utterance x, predicting a frame’s length |y|, and decoding |y|-sized frame with ontology tokens. Though empirically strong, these models are typically bottlenecked by prediction, as even small inaccuracies change the syntactic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive which shift...

10.18653/v1/2021.findings-emnlp.161 preprint EN cc-by 2021-01-01

In this paper, we analyze and characterize the metastability of 11 previously proposed high-performance flip-flops, reduced clock-swing level-converting flip-flops. From extensive simulation results in 65nm CMOS technology, main parameters ¿ T <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> are extracted analyzed at both nominal supply voltage. Our indicate that these flip-flops exhibit a wide range (up to few orders magnitudes) windows....

10.1109/isqed.2010.5450482 article EN 2010-03-01

A single-cycle tree-based 64-bit binary comparator with constant-delay (CD) logic realized in a 65-nm, 1-V CMOS process is presented this paper. Unlike dynamic yet domino-compatible, CD predischarges the output to "0" and conditionally makes transition "1" through critical-path CLK PMOS transistors for an NMOS transistor network. The constant delay (regardless of fan-in) feature it up 2× faster than gate during D-Q mode complex such as two-bit comparator. proposed comparator's architecture...

10.1109/tcsi.2013.2268591 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2013-10-11

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by high computation storage requirements of DNNs, especially for energy-constrained inference at edge using wearable IoT devices. One promising approach to alleviate computational challenges implementing DNNs low-precision fixed point (<;16 bits) representation. However, quantization error inherent any...

10.1109/dac.2018.8465893 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

In this paper, detailed analysis is given on the design of metastable-hardened and soft-error tolerant flip-flops while maintaining basic characteristics low-power high-performance. We also propose two new flip-flop designs: pre-discharge (PDFF-SE) sense-amplifier transmission-gate (SATG-SE). Following our main approach, both PDFF-SE SATG-SE use a cross-coupled inverter critical path in master-stage to achieve good metastability generating differential signals facilitate usage Quatro cell...

10.1109/isqed.2011.5770787 article EN 2011-03-01

Enterprise server processor designs, which operate at extreme high frequencies and power envelopes, depend critically on supply noise mitigation techniques. With voltage scaling, very current draws, broad usage of clock gating, advanced solutions are needed for next-generation products to minimize droop response time, can be defined as the latency from when a dangerous begins until countermeasure is effective.

10.1109/isscc.2018.8310303 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2018-02-01

The POWER9TM family of chips is fabricated in 14-nm silicon-on-insulator finFET technology using 17 levels copper interconnect. 695-mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 24-core microprocessor features a new core based on an execution slice microarchitecture. chip contains 8 billion transistors and has 120 MB eDRAM L3 cache. processor adaptive clock strategy to reduce timing margin needed during power supply droop events by...

10.1109/jssc.2017.2748623 article EN IEEE Journal of Solid-State Circuits 2017-12-14

Successful power supply noise mitigation requires a system-level approach that includes design and modeling of the circuits with delivery network (PDN) on chip, chip module, backplane, voltage regulator module (VRM). Traditionally, periodic square-wave activity patterns all cores in sync, which yield low-frequency (LF) or mid-frequency (MF) impedance peaks associated backplane chip/module, respectively, are considered to give rise worst case noise. However, droops both deeper faster at...

10.1109/isscc.2017.7870449 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...

10.48550/arxiv.2402.08017 preprint EN arXiv (Cornell University) 2024-02-12

In this work, a new design approach in implementing low-energy, high-performance 64-bit adder using dynamic feedthrough logic (DFTL) is introduced and analyzed. Design issues of DFTL several depth are analyzed order to achieve the best optimal balance between performance power consumption. A ldquotiming windowrdquo technique also proposed reduce amount excessive dissipation approach. Sklansky carry-merge used as benchmark comparison different styles including DFTL, CDL, dynamic, static...

10.1109/iscas.2009.5118443 article EN 1993 IEEE International Symposium on Circuits and Systems 2009-05-01

Flip-flop metastability is becoming an important consideration for designing reliable synchronous and asynchronous systems, especially in the sub-threshold region where it degrades exponentially with reduction supply voltage. In this paper, detailed analysis given on design of metastable-hardened flip-flops region. Proper transistor sizing using either transconductance or load variation along implementing inverter pair flip-flop master-stage low-V th can result significant time-resolving...

10.5555/2016802.2016844 article EN International Symposium on Low Power Electronics and Design 2011-08-01

The IBM z14 is the latest update in storied history of mainframes. Reliability, availability, security, and scalability are foundation mainframe line. System reliability availability targets excess 10 years, requiring rigorous chip characterization processes. In this paper, we discuss some many processes used to ensure that lifetime. An additional part power management (PM). 5.2-GHz high-power design central processor requires advanced on-die PM capabilities adapt intensive instruction...

10.1109/jssc.2018.2873582 article EN IEEE Journal of Solid-State Circuits 2018-11-13

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front model accuracy vs size, researchers are trapped in dilemma optimizing by training and fine-tuning for each individual device while keeping GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where single neural network can be pruned generate optimized large range sizes. We develop...

10.1109/icassp43922.2022.9746469 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27
Coming Soon ...