- Advanced Neural Network Applications
- Neural Networks and Applications
- Machine Learning and Data Classification
- Domain Adaptation and Few-Shot Learning
- Speech Recognition and Synthesis
- Speech and Audio Processing
- Model Reduction and Neural Networks
- Sparse and Compressive Sensing Techniques
- Adversarial Robustness in Machine Learning
- Digital Filter Design and Implementation
- Topic Modeling
- Machine Learning and ELM
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Advanced Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Stock Market Forecasting Methods
- Anomaly Detection Techniques and Applications
- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Advanced Data Compression Techniques
- Music and Audio Processing
Rebellion (United Kingdom)
2022
Seoul National University
2017-2021
Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose improved fixed-point algorithm that estimates the quantization step size dynamically during In addition, a gradual scheme is also tested, which sequentially applies optimizations from high- to low-precision. The experiments conducted for...
The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve performance quantized networks. In this study, we propose stochastic precision ensemble training QDNNs (SPEQ). SPEQ is a scheme; however, teacher formed by sharing model parameters student network. We obtain soft labels randomly changing bit activation stochastically at each layer forward-pass computation. trained...
The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the embedded systems because of its large size. In this study, we quantize parameters hidden signals for complexity reduction. Not only matrices weights embedding but input softmax outputs are also quantized utilize low-precision matrix multiplication. fixed-point optimization steps consist quantization sensitivity...
The growing computational demands of AI inference have led to widespread use hardware accelerators for different platforms, spanning from edge the datacenter/cloud. Certain application areas, such as in high-frequency trading (HFT) [1–2], a hard latency deadline successful execution. We present our new accelerator which achieves high capability with outstanding single-stream responsiveness demanding service-layer objective (SLO)-based services and pipelined applications, including large...
Most deep neural networks (DNNs) require complex models to achieve high performance. Parameter quantization is widely used for reducing the implementation complexities. Previous studies on were mostly based extensive simulation using training data a specific model. We choose different approach and attempt measure per-parameter capacity of DNN interpret results obtain insights optimum parameters. This research uses artificially generated generic forms fully connected DNNs, convolutional...
Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain number weights, thus they need many off-chip memory accesses We propose weight compression method deep networks, which allows values +1 or -1 only at predetermined positions the weights so that decoding using table can be conducted easily. For example, structured sparse (8,2) coding most two non-zero among eight weights. This not enables...
Knowledge distillation (KD) technique that utilizes a pretrained teacher model for training student network is exploited the optimization of quantized deep neural networks (QDNNs). We consider choice and also investigate effect hyperparameters KD. employ several large floating-point models as network. The experiment shows softmax distribution produced by more important than its performance effective KD training. Since can be controlled KD's hyperparameters, we analyze interrelationship each...
Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods the quantization of given models. However, sensitivity depends model architecture. Also, characteristics weight activation quite different. This study proposes a holistic approach QDNNs, which contains QDNN training as well quantization-friendly architecture design. Synthesized data is used to visualize effects...
Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme quantized reach flat minima in the loss surface aid noise. The proposed employs high-low-high-low precision an alternating manner network training. learning rate also abruptly changed at each...
Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique exploited quantized deep neural networks (QDNNs) training as way to restore performance sacrificed by word-length KD, however, employs additional hyper-parameters, such temperature, coefficient, and of teacher network QDNN training. We analyze effect these hyper-parameters optimization with KD. find that are inter-related, also introduce simple effective reduces \textit{coefficient} during...
Fixed-point optimization of deep neural networks plays an important role in hardware based design and low-power implementations. Many show fairly good performance even with 2- or 3-bit precision when quantized weights are fine-tuned by retraining. We propose improved fixedpoint algorithm that estimates the quantization step size dynamically during In addition, a gradual scheme is also tested, which sequentially applies fixed-point optimizations from high- to low-precision. The experiments...
Low-precision deep neural networks (DNNs) are very needed for efficient implementations, but severe quantization of weights often sacrifices the generalization capability and lowers test accuracy. We present a new quantized network optimization approach, stochastic weight averaging (SQWA), to design low-precision DNNs with good using model averaging. The proposed approach includes (1) floating-point training, (2) direct weights, (3) capturing multiple low precision models during retraining...
Designing a deep neural network (DNN) with good generalization capability is complex process especially when the weights are severely quantized. Model averaging promising approach for achieving of DNNs, loss surface training contains many sharp minima. We present new quantized optimization approach, stochastic weight (SQWA), to design low-precision DNNs using model averaging. The proposed includes (1) floating-point training, (2) direct quantization weights, (3) capturing multiple models...
Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods the quantization of given models. However, sensitivity depends model architecture. Therefore, selection needs to be a part QDNN design process. Also, characteristics weight activation quite different. This study proposes holistic approach QDNNs, which contains training as well quantization-friendly architecture...
We present the world's first AI-enabled high-frequency trading (HFT) system, LightTrader , which integrates custom AI accelerators and FPGA-based conventional HFT pipeline for low-latency-high-throughput solutions with a reduced query miss rate. For better utilization, adaptive job scheduling methods are also proposed to further improve performance, where layer-wise workload scaling dynamic voltage-frequency (DVFS) techniques progressively adjust workloads of accelerators, in conjunction...
Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain number weights, thus they need many off-chip memory accesses We propose weight compression method deep networks, which allows values +1 or -1 only at predetermined positions the weights so that decoding using table can be conducted easily. For example, structured sparse (8,2) coding most two non-zero among eight weights. This not enables...