NFDI4DS | UHH-SEMS - Publication Details

En-Yu Yang

ORCID: 0000-0003-0281-4086

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5088786790

Research Areas

Advanced Memory and Neural Computing
Advanced Neural Network Applications
Advanced Data Storage Technologies
Semiconductor materials and devices
Parallel Computing and Optimization Techniques
Speech Recognition and Synthesis
Speech and Audio Processing
Ferroelectric and Negative Capacitance Devices
Topic Modeling
Cellular Automata and Applications
Advanced Image and Video Retrieval Techniques
Numerical Methods and Algorithms
Robotics and Sensor-Based Localization
Music and Audio Processing
Blind Source Separation Techniques
Interconnection Networks and Systems
Adversarial Robustness in Machine Learning
Distributed and Parallel Computing Systems
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Embedded Systems Design Techniques
Cloud Computing and Resource Management
CCD and CMOS Imaging Sensors
Advanced Vision and Imaging
Robotic Path Planning Algorithms

Harvard University Press
2020-2024

Harvard University
2022

National Tsing Hua University
2001-2018

Powerchip (Taiwan)
2005

Foxnum Technology (Taiwan)
2005

Taiwan Semiconductor Manufacturing Company (Taiwan)
2003

A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors

OPENALEX - Publications

Wei-Hao Chen Kaixiang Li Wei‐Yu Lin Kuo-Hsiang Hsu Pinyi Li and 12 more

Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for neural network (trained off-line on an AI server), and require low-energy fast I/O accesses. The deep networks (DNN) used by processors [1,2] commonly p-layers of a convolutional (CNN) q-layers fully-connected (FCN). Current DNN that conventional (von-Neumann) structure are limited high access latencies, energy consumption, hardware costs. Large working data sets result in heavy accesses...

10.1109/isscc.2018.8310400 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2018-02-01

A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors

OPENALEX - Publications

Win-San Khwa Jiajing Chen Jiafang Li Xin Si En-Yu Yang and 6 more

For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation predominates computational workload for both convolution (CNVL) and fully-connect (FCNL) neural-network (NN) layers. This hinders adoption of DNN to on edge artificial-intelligence (AI) devices, which require low-power, low-cost fast inference. Binary DNNs [5-6] are used reduce computation hardware costs AI devices; however, a memory bottleneck still remains. In Fig. 31.5.1 conventional PE arrays exploit...

10.1109/isscc.2018.8310401 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2018-02-01

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

OPENALEX - Publications

Thierry Tambe Coleman Hooper Lillian Pentecost Tianyu Jia En-Yu Yang and 6 more

Transformer-based language models such as BERT provide significant accuracy improvement to a multitude of natural processing (NLP) tasks. However, their hefty computational and memory demands make them challenging deploy resource-constrained edge platforms with strict latency requirements.

10.1145/3466752.3480095 article EN 2021-10-17

Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference

OPENALEX - Publications

Thierry Tambe En-Yu Yang Zishen Wan Yuntian Deng Vijay Janapa Reddi and 3 more

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low precision their shrunken dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present an algorithm-hardware co-design centered around a novel floating-point inspired number format, AdaptivFloat, that dynamically maximizes and optimally clips its available range, layer granularity, order create faithful...

10.1109/dac18072.2020.9218516 article EN 2020-07-01

9.8 A 25mm2 SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET

OPENALEX - Publications

Thierry Tambe En-Yu Yang Glenn G. Ko Yuji Chai Coleman Hooper and 5 more

Automatic speech recognition (ASR) using deep learning is essential for user interfaces on IoT devices. However, previously published ASR chips [4-7] do not consider realistic operating conditions, which are typically noisy and may include more than one speaker. Furthermore, several of these works have implemented only small-vocabulary tasks, such as keyword-spotting (KWS), where context-blind neural network (DNN) algorithms adequate. large-vocabulary tasks (e.g., >100k words), the complex...

10.1109/isscc42613.2021.9366062 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

14.5 A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types, Distributed Hardware Power Management and Flexible NoC-Based Data Orchestration

OPENALEX - Publications

Maico Cassel dos Santos Tianyu Jia Joseph D. Zuckerman Martin Cochet Davide Giri and 24 more

Modern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run applications in parallel. This brings challenges managing how the access shared resources, e.g., memory hierarchy, communication channels, on-chip power. We address these through flexible orchestration data on 74Tbps network-on-chip (NoC) for dynamic management resources under contention distributed power (DHPM) scheme. Developing testing ideas requires comprehensive evaluation platform....

10.1109/isscc49657.2024.10454572 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

OPENALEX - Publications

Thierry Tambe En-Yu Yang Zishen Wan Yuntian Deng Vijay Janapa Reddi and 3 more

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available range, layer granularity, order create faithful encoding of neural network...

10.48550/arxiv.1909.13271 preprint EN other-oa arXiv (Cornell University) 2019-01-01

A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs

OPENALEX - Publications

Thierry Tambe En-Yu Yang Glenn G. Ko Yuji Chai Coleman Hooper and 5 more

The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in consumer Internet Things (IoT) market. As these are being applied to keyword spotting (KWS), automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications, it paramount importance that they provide uncompromising performance for context learning long sequences, which a key benefit...

10.1109/jssc.2022.3179303 article EN IEEE Journal of Solid-State Circuits 2022-06-08

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

OPENALEX - Publications

Tianyu Jia En-Yu Yang Yu-Shun Hsiao Jonathan Cruz David J. Brooks and 2 more

Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap reliable memory-efficient dense model represent full environment, with voxel node pruning expansion capacity. It widely used but limited by its single-thread design. This paper presents first...

10.23919/date54114.2022.9774508 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2022-03-14

Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration

OPENALEX - Publications

Georgios Zacharopoulos Adel Ejjeh Ying Jing En-Yu Yang Tianyu Jia and 7 more

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts an application to accelerate in hardware leave software. Moreover, applications domains such as Extended Reality (XR) offer opportunities for various forms parallel execution, including loop level, task pipeline parallelism. To assist the process expose every possible level parallelism, we...

10.1145/3580394 article EN ACM Transactions on Embedded Computing Systems 2023-01-17

FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference

OPENALEX - Publications

En-Yu Yang Tianyu Jia David Brooks Gu-Yeon Wei

Deep neural networks (DNN) have become ubiquitous and dominant in various application domains due to its state-of-the-art learning capabilities. To run compute memory intensive DNN models, designing specialized hardware accelerators becomes the common choice. However, performance improvement comes with limitations on programmability, which has crucial given rapid evolution of models. In this work, we first conduct workload analysis a diverse set including CNN, LSTM, Transformer, GCN...

10.1109/asap52443.2021.00046 article EN 2021-07-01

Highly scalable ballistic injection AND-type (BiAND) flash memory

OPENALEX - Publications

Meng-Yi Wu Sheng-Huei Dai Shu‐Fen Hu En-Yu Yang C.C.-H. Hsu and 1 more

An AND-type split-gate Flash memory cell with a trench select gate and buried n/sup +/ source is proposed. This cell, programmed by ballistic side injection (BSSI), can provide high programming efficiency size of 5F/sup 2/. Furthermore, both the speed read current are enhanced shared configuration.

10.1109/ted.2005.860636 article EN IEEE Transactions on Electron Devices 2005-12-22

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration

OPENALEX - Publications

Georgios Zacharopoulos Adel Ejjeh Ying Jing En-Yu Yang Tianyu Jia and 7 more

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts an application to accelerate in hardware leave software. Moreover, applications domains such as Extended Reality (XR) offer opportunities for various forms parallel execution, including loop level, task level pipeline parallelism. To assist the process expose every possible parallelism, we...

10.48550/arxiv.2201.08603 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Comprehensive study on a novel bidirectional tunneling program/erase NOR-type (BiNOR) 3-D flash memory cell

OPENALEX - Publications

Amy Hsiu-Fen Chou En-Yu Yang Cheng-Jye Liu Hsiu-Hsiang Pong Ming-Chi Liaw and 4 more

In this paper a recently proposed bidirectional tunneling program/erase (P/E) NOR-type (BiNOR) flash memory is extensively investigated. With the designated localized p-well structure, uniform Fowler-Nordheim (FN) first fulfilled for both program and erase operations in array architecture to facilitate low power applications. The BiNOR guarantees excellent tunnel oxide reliability provided with fast random access capability. Furthermore, three-dimensional (3D) current path addition...

10.1109/16.930656 article EN IEEE Transactions on Electron Devices 2001-07-01

Novel bi-directional tunneling NOR (BiNOR) type 3-D flash memory cell

OPENALEX - Publications

En-Yu Yang Cheng-Jye Liu Tien‐Sheng Chao Ming-Chi Liaw C.C.-H. Hsu

A novel 3D flash memory, BiNOR, with a localized shallow P-well is proposed for high speed, low power and reliability applications. Low bi-directional tunneling program/erase realized in NOR array, which guarantees better tunnel oxide reliability, where previously could only be performed NAND arrays. Moreover, read performance achieved by more than 15% conduction current enhancement due to the cell structure.

10.1109/vlsit.1999.799352 article EN 2003-01-20

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

OPENALEX - Publications

Thierry Tambe Coleman Hooper Lillian Pentecost Tianyu Jia En-Yu Yang and 6 more

Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design latency-aware energy optimization multi-task NLP. EdgeBERT employs entropy-based early exit predication in order perform dynamic...

10.48550/arxiv.2011.14203 preprint EN other-oa arXiv (Cornell University) 2020-01-01

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

OPENALEX - Publications

Tianyu Jia En-Yu Yang Yu-Shun Hsiao Jonathan Cruz David J. Brooks and 2 more

Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap reliable memory-efficient dense model represent full environment, with voxel node pruning expansion capacity. This paper presents first efficient accelerator solution, i.e. OMU, enable...

10.48550/arxiv.2205.03325 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A highly reliable NAND structure flash memory capable for low voltage operation

OPENALEX - Publications

Yu-You Lin Chao‐Sung Lai S. S. Chung En-Yu Yang S. Pittikoun and 2 more

For the first time, a new flash cell, called buried bit-line AND (BiAND), is proposed. Buried can achieve low voltage programming/erase. The major difference of current cell from conventional special design contact. With use bit-line, required high program/erase for FN tunneling be divided between word-line and such that lower operation feasible. Further, comparison reliability different schemes, i.e., F-N (HV F-N) Bi operating has been studied. Results show BiAND scheme gives much better...

10.1109/relphy.2005.1493195 article EN 2005-08-16

SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33rd Hot Chips Symposium – August 22-24, 2021

OPENALEX - Publications

Thierry Tambe En-Yu Yang Glenn G. Ko Yuji Chai Coleman Hooper and 5 more

In this work, we present SM6, an SoC architecture for real-time denoised speech and NLP pipelines, featuring (1) MSSE: unsupervised probabilistic sound source separation accelerator, (2) FlexNLP: a programmable inference accelerator attention-based seq2seq DNNs using adaptive floating-point datatypes wide dynamic range computations, (3) dual-core Arm Cortex A53 CPU cluster, which provides on-demand SIMD FFT processing, operating system support. adverse acoustic conditions, MSSE allows...

10.1109/hcs52781.2021.9567180 article EN 2021-08-22

Novel Bi-directional tunneling program/erase NOR (BiNOR) type flash EEPROM

OPENALEX - Publications

En-Yu Yang Cheng-Jye Liu Tien‐Sheng Chao Ming-Chi Liaw C.C.-H. Hsu

This paper presents a novel Bi-directional channel FN tunneling program/erase NOR (BiNOR) type flash memory cell for the reliable, high speed, and low power operation. With localized shallow p-well at bit-line, BiNOR realizes in NOR-type array architecture, which could only be done previously NAND architecture. Furthermore, read current is enhanced greatly by 3-D conduction effect due to designated p-well.

10.1109/vtsa.1999.786036 article EN 2003-01-20

Coming Soon ...