Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Speedup Control reconfiguration Gate array High-Level Synthesis
DOI: 10.1609/aaai.v32i1.11653 Publication Date: 2022-06-24T21:08:34Z
ABSTRACT
Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim this paper is to achieve ultra-high energy efficiency performance for hardware implementations neural networks (DNNs). An algorithm-hardware co-optimization framework developed, which applicable different DNN types, sizes, application scenarios. algorithm part adopts the general block-circulant matrices a fine-grained tradeoff accuracy compression ratio. It applies both fully-connected convolutional layers contains mathematically rigorous proof effectiveness method. proposed reduces computational complexity per layer from O(n2) O(n log n) storage O(n), training inference. consists highly efficient Field Programmable Gate Array (FPGA)-based using effective reconfiguration, batch processing, pipelining, resource re-using, hierarchical control. Experimental results demonstrate that achieves at least 152X speedup 71X gain compared with IBM TrueNorth processor under same test accuracy. 31X reference FPGA-based work.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (9)