- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Advanced Memory and Neural Computing
- Human Pose and Action Recognition
- Stochastic Gradient Optimization Techniques
- Machine Learning and ELM
- Neural Networks and Applications
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Artificial Intelligence in Games
- CCD and CMOS Imaging Sensors
- Error Correcting Code Techniques
- Parallel Computing and Optimization Techniques
- Video Surveillance and Tracking Methods
- IoT and Edge/Fog Computing
- Sparse and Compressive Sensing Techniques
- Cell Image Analysis Techniques
- Ferroelectric and Negative Capacitance Devices
- Advanced Data Compression Techniques
- Neural dynamics and brain function
- Speech Recognition and Synthesis
- Network Security and Intrusion Detection
Clemson University
2022-2024
Ocean University of China
2024
Northeastern University
2007-2023
Universidad del Noreste
2019-2022
Northwest University
2022
William & Mary
2019
Syracuse University
2017-2018
Chinese Academy of Sciences
2018
Institute of Computing Technology
2018
Stony Brook University
2006-2007
Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical improve energy efficiency performance while maintaining accuracy. For DNNs, model an important factor affecting performance, scalability efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) irregular network structure after pruning, which affects throughput; 2) increased training complexity; 3) lack rigirous guarantee...
With the emergence of a spectrum high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing Deep Neural Networks (DNNs) inference is still challenging considering high and storage demands, specifically, if real-time performance with accuracy needed. Weight pruning DNNs proposed, but existing schemes represent two extremes in design space: non-structured fine-grained, accurate, not hardware...
Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements accelerate inference. An automatic hyperparameter determination process necessary due large number flexible hyperparameters. This work proposes AutoCompress, an structured framework with following key performance improvements: (i) effectively incorporate combination schemes in process; (ii) adopt state-of-art ADMM-based as core algorithm, propose innovative...
Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration a variety of platforms, and DNN weight pruning is straightforward method. There are currently two mainstreams methods representing extremes regularity: non-structured, fine-grained can high sparsity accuracy, but not hardware friendly; structured, coarse-grained exploits hardware-efficient structures in pruning, suffers from accuracy drop when the rate high. In...
Large deep neural network (DNN) models pose the key challenge to energy efficiency due significantly higher consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates intensive research on model compression with two main approaches. Weight pruning leverages redundancy in number weights and can be performed a non-structured, which has flexibility rate but incurs index irregular weights, structured manner, preserves full matrix structure lower rate. quantization...
Channel pruning has been broadly recognized as an effective technique to reduce the computation and memory cost of deep convolutional neural networks. However, conventional methods have limitations in that: they are restricted process only, require a fully pre-trained large model. Such may lead sub-optimal model quality well excessive training cost. In this paper, we propose novel Exploration methodology, dubbed CHEX, rectify these problems. As opposed pruning-only strategy, repeatedly prune...
With the trend to deploy Deep Neural Network (DNN) inference models on edge devices with limited resources, quantization techniques have been widely used reduce on-chip storage and improve computation throughput. However, existing DNN work deploying below 8-bit may be either suffering from evident accuracy loss or facing a big gap between theoretical improvement of throughput practical speedup. In this work, we propose general framework, called FILM-QNN, quantize accelerate multiple across...
Weight pruning methods of deep neural networks (DNNs) have been demonstrated to achieve a good model rate without loss accuracy, thereby alleviating the significant computation/storage requirements large-scale DNNs. Structured weight proposed overcome limitation irregular network structure and actual GPU acceleration. However, in prior work, (degree sparsity) acceleration are limited (to less than 50%) when accuracy needs be maintained. In this we these limitations by proposing unified,...
Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim this paper is to achieve ultra-high energy efficiency performance for hardware implementations neural networks (DNNs). An algorithm-hardware co-optimization framework developed, which applicable different DNN types, sizes, application scenarios. algorithm part adopts the general block-circulant matrices a fine-grained tradeoff accuracy compression ratio. It applies both...
The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, accuracy solution extreme structured that combines different types of sparsity still waiting unraveling due to extremely reduced weights in networks. In this paper, we propose a two (filter column prune) by incorporating alternating direction method multipliers...
Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm on edge. This paper proposes novel Memory-Economic Sparse Training (MEST) framework targeting accurate and fast execution edge devices. The proposed MEST consists enhancements by Elastic Mutation (EM) Soft Memory Bound (&S) that ensure superior accuracy at high ratios. Different from existing works sparse training, this current work reveals importance schemes performance...
Weight pruning methods of DNNs have been demonstrated to achieve a good model rate without loss accuracy, thereby alleviating the significant computation/storage requirements large-scale DNNs. Structured weight proposed overcome limitation irregular network structure and actual GPU acceleration. However, in prior work (degree sparsity) acceleration are limited (to less than 50%) when accuracy needs be maintained. In this work,we these limitations by proposing unified, systematic framework...
Both industry and academia have extensively investigated hardware accelerations. To address the demands in increasing computational capability memory requirement, this work, we propose structured weight matrices (SWM)-based compression technique for both Field Programmable Gate Array (FPGA) application-specific integrated circuit (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant to achieve a fine-grained tradeoff between accuracy ratio. The can reduce...
Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to large model size and computational requirements, compression deep neural networks (DNNs) becomes necessary, especially considering real-time requirement embedded systems. In this paper, we extend prior work on systematic DNN weight pruning using ADMM (Alternating Direction Method of Multipliers). We integrate regularization with masked mapping/retraining, thereby...
Weight pruning is a popular technique to reduce the size and computation complexity of Convolutional Neural Networks (CNNs). Despite its success in reducing model size, weight has brought limited benefit CNN inference performance, due irregularity introduced sparse convolution operations. In this work, we aim improve performance convolutions on GPUs by mitigating irregularity. We find that existing optimization techniques for matrix computations fail accelerate convolutions, observe main...
To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed generally fall into two categories, i.e., static regularization-based dynamic pruning. However, former method currently suffers either complex workloads or accuracy degradation, while latter one takes a long time to tune parameters achieve desired rate without loss. In this paper, we propose unified DNN framework with dynamically updated...
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability mobile edge devices. However, previous methods mainly focus on reducing model size and/or improving performance without considering privacy user data. To mitigate this concern, we propose a privacy-preserving-oriented acceleration framework that does not require private training dataset. At algorithm level framework, systematic weight technique based alternating direction...
Large deep neural network (DNN) models pose the key challenge to energy efficiency due significantly higher consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates intensive research on model compression with two main approaches. Weight pruning leverages redundancy in number weights and can be performed a non-structured, which has flexibility rate but incurs index irregular weights, structured manner, preserves full matrix structure lower rate. quantization...
Weight pruning is a powerful technique to realize model compression. We propose PCNN, fine-grained regular 1D method. A novel index format called Sparsity Pattern Mask (SPM) presented encode the sparsity in PCNN. Leveraging SPM with limited patterns and non-zero sequences equal length, PCNN can be efficiently employed hardware. Evaluated on VGG-16 ResNet-18, our achieves compression rate up 8.4× only 0.2% accuracy loss. also implement pattern-aware architecture 55nm process, achieving 9.0×...