NFDI4DS | UHH-SEMS - Publication Details

Fixed-point optimization of deep neural networks with adaptive step size retraining

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose improved fixed-point algorithm that estimates the quantization step size dynamically during In addition, a gradual scheme is also tested, which sequentially applies optimizations from high- to low-precision. The experiments conducted for...

10.1109/icassp.2017.7952347 preprint EN 2017-03-01

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

OPENALEX - Publications

Yoonho Boo Sungho Shin Jungwook Choi Wonyong Sung

The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve performance quantized networks. In this study, we propose stochastic precision ensemble training QDNNs (SPEQ). SPEQ is a scheme; however, teacher formed by sharing model parameters student network. We obtain soft labels randomly changing bit activation stochastically at each layer forward-pass computation. trained...

10.1609/aaai.v35i8.16839 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Fixed-Point Optimization of Transformer Neural Network

OPENALEX - Publications

Yoonho Boo Wonyong Sung

The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the embedded systems because of its large size. In this study, we quantize parameters hidden signals for complexity reduction. Not only matrices weights embedding but input softmax outputs are also quantized utilize low-precision matrix multiplication. fixed-point optimization steps consist quantization sensitivity...

10.1109/icassp40776.2020.9054724 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications

OPENALEX - Publications

Chang-Hyo Yu Hyoeun Kim Sungho Shin Kyeongryeol Bong Hyun‐Suk Kim and 61 more

The growing computational demands of AI inference have led to widespread use hardware accelerators for different platforms, spanning from edge the datacenter/cloud. Certain application areas, such as in high-frequency trading (HFT) [1–2], a hard latency deadline successful execution. We present our new accelerator which achieves high capability with outstanding single-stream responsiveness demanding service-layer objective (SLO)-based services and pipelined applications, including large...

10.1109/isscc49657.2024.10454509 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Memorization Capacity of Deep Neural Networks under Parameter Quantization

OPENALEX - Publications

Yoonho Boo Sungho Shin Wonyong Sung

Most deep neural networks (DNNs) require complex models to achieve high performance. Parameter quantization is widely used for reducing the implementation complexities. Previous studies on were mostly based extensive simulation using training data a specific model. We choose different approach and attempt measure per-parameter capacity of DNN interpret results obtain insights optimum parameters. This research uses artificially generated generic forms fully connected DNNs, convolutional...

10.1109/icassp.2019.8682462 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

OPENALEX - Publications

Yoonho Boo Wonyong Sung

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain number weights, thus they need many off-chip memory accesses We propose weight compression method deep networks, which allows values +1 or -1 only at predetermined positions the weights so that decoding using table can be conducted easily. For example, structured sparse (8,2) coding most two non-zero among eight weights. This not enables...

10.1109/sips.2017.8110021 article EN 2017-10-01

Knowledge Distillation for Optimization of Quantized Deep Neural Networks

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Knowledge distillation (KD) technique that utilizes a pretrained teacher model for training student network is exploited the optimization of quantized deep neural networks (QDNNs). We consider choice and also investigate effect hyperparameters KD. employ several large floating-point models as network. The experiment shows softmax distribution produced by more important than its performance effective KD training. Since can be controlled KD's hyperparameters, we analyze interrelationship each...

10.1109/sips50750.2020.9195219 article EN 2020-09-23

Quantized Neural Networks: Characterization and Holistic Optimization

OPENALEX - Publications

Yoonho Boo Sungho Shin Wonyong Sung

Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods the quantization of given models. However, sensitivity depends model architecture. Also, characteristics weight activation quite different. This study proposes a holistic approach QDNNs, which contains QDNN training as well quantization-friendly architecture design. Synthesized data is used to visualize effects...

10.1109/sips50750.2020.9195245 article EN 2020-09-23

HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

OPENALEX - Publications

Sungho Shin Jin-Hwan Park Yoonho Boo Wonyong Sung

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme quantized reach flat minima in the loss surface aid noise. The proposed employs high-low-high-low precision an alternating manner network training. learning rate also abruptly changed at each...

10.1609/aaai.v34i04.6035 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Knowledge distillation for optimization of quantized deep neural networks

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique exploited quantized deep neural networks (QDNNs) training as way to restore performance sacrificed by word-length KD, however, employs additional hyper-parameters, such temperature, coefficient, and of teacher network QDNN training. We analyze effect these hyper-parameters optimization with KD. find that are inter-related, also introduce simple effective reduces \textit{coefficient} during...

10.48550/arxiv.1909.01688 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Fixed-point optimization of deep neural networks with adaptive step size retraining

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose improved fixedpoint algorithm that estimates the quantization step size dynamically during In addition, a gradual scheme is also tested, which sequentially applies fixed-point optimizations from high- to low-precision. The experiments...

10.48550/arxiv.1702.08171 preprint EN other-oa arXiv (Cornell University) 2017-01-01

SQWA: Stochastic Quantized Weight Averaging For Improving The Generalization Capability Of Low-Precision Deep Neural Networks

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Low-precision deep neural networks (DNNs) are very needed for efficient implementations, but severe quantization of weights often sacrifices the generalization capability and lowers test accuracy. We present a new quantized network optimization approach, stochastic weight averaging (SQWA), to design low-precision DNNs with good using model averaging. The proposed approach includes (1) floating-point training, (2) direct weights, (3) capturing multiple low precision models during retraining...

10.1109/icassp39728.2021.9413623 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Hierarchical Recurrent Neural Networks for Acoustic Modeling

OPENALEX - Publications

Jin-Hwan Park Iksoo Choi Yoonho Boo Wonyong Sung

10.21437/interspeech.2018-1797 article EN Interspeech 2022 2018-08-28

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

OPENALEX - Publications

Sungho Shin Yoonho Boo Wonyong Sung

Designing a deep neural network (DNN) with good generalization capability is complex process especially when the weights are severely quantized. Model averaging promising approach for achieving of DNNs, loss surface training contains many sharp minima. We present new quantized optimization approach, stochastic weight (SQWA), to design low-precision DNNs using model averaging. The proposed includes (1) floating-point training, (2) direct quantization weights, (3) capturing multiple models...

10.48550/arxiv.2002.00343 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Quantized Neural Networks: Characterization and Holistic Optimization

OPENALEX - Publications

Yoonho Boo Sungho Shin Wonyong Sung

Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods the quantization of given models. However, sensitivity depends model architecture. Therefore, selection needs to be a part QDNN design process. Also, characteristics weight activation quite different. This study proposes holistic approach QDNNs, which contains training as well quantization-friendly architecture...

10.48550/arxiv.2006.00530 preprint EN other-oa arXiv (Cornell University) 2020-01-01

LightTrader : World’s first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators

OPENALEX - Publications

Hyunsung Kim Sungyeob Yoo Jaewan Bae Kyeongryeol Bong Yoonho Boo and 12 more

We present the world's first AI-enabled high-frequency trading (HFT) system, LightTrader , which integrates custom AI accelerators and FPGA-based conventional HFT pipeline for low-latency-high-throughput solutions with a reduced query miss rate. For better utilization, adaptive job scheduling methods are also proposed to further improve performance, where layer-wise workload scaling dynamic voltage-frequency (DVFS) techniques progressively adjust workloads of accelerators, in conjunction...

10.1109/hcs55958.2022.9895619 article EN 2022-08-21

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

OPENALEX - Publications

Yoonho Boo Wonyong Sung

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain number weights, thus they need many off-chip memory accesses We propose weight compression method deep networks, which allows values +1 or -1 only at predetermined positions the weights so that decoding using table can be conducted easily. For example, structured sparse (8,2) coding most two non-zero among eight weights. This not enables...

10.48550/arxiv.1707.03684 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Compression of Deep Neural Networks with Structured Sparse Ternary Coding

OPENALEX - Publications

Yoonho Boo Wonyong Sung

10.1007/s11265-018-1418-z article EN Journal of Signal Processing Systems 2018-11-06