Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Graphics processing unit Pruning Stratix
DOI: 10.3390/electronics8030295 Publication Date: 2019-03-07T15:52:22Z
ABSTRACT
Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters CNNs cause heavy computing and memory burdens FPGA-based CNN implementation. To solve this problem, paper proposes an optimized compression strategy, realizes accelerator based on FPGA CNNs. Firstly, reversed-pruning strategy proposed which reduces number AlexNet by factor 13× without accuracy loss ImageNet dataset. Peak-pruning further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible accuracy. Secondly, efficient storage technique, aims reduction whole overhead cache layer fully connected layer, presented respectively. Finally, effectiveness verified implemented Xilinx ZCU104 evaluation board. By improving existing pruning techniques format sparse data, we significantly reduce size 28×, from 243 MB 8.7 MB. In addition, overall performance our achieves 9.73 fps compressed AlexNet. Compared central processing unit (CPU) graphics (GPU) platforms, implementation 182.3× 1.1× improvements in latency throughput, respectively, (CONV) layers AlexNet, 822.0× 15.8× improvement energy efficiency, separately. This novel provides reference other applications, including CNNs, long short-term (LSTM), recurrent networks (RNNs).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (29)
CITATIONS (58)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....