Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
Speedup
Control reconfiguration
Gate array
High-Level Synthesis
DOI:
10.1609/aaai.v32i1.11653
Publication Date:
2022-06-24T21:08:34Z
AUTHORS (11)
ABSTRACT
Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim this paper is to achieve ultra-high energy efficiency performance for hardware implementations neural networks (DNNs). An algorithm-hardware co-optimization framework developed, which applicable different DNN types, sizes, application scenarios. algorithm part adopts the general block-circulant matrices a fine-grained tradeoff accuracy compression ratio. It applies both fully-connected convolutional layers contains mathematically rigorous proof effectiveness method. proposed reduces computational complexity per layer from O(n2) O(n log n) storage O(n), training inference. consists highly efficient Field Programmable Gate Array (FPGA)-based using effective reconfiguration, batch processing, pipelining, resource re-using, hierarchical control. Experimental results demonstrate that achieves at least 152X speedup 71X gain compared with IBM TrueNorth processor under same test accuracy. 31X reference FPGA-based work.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (9)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....