A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines

Application-specific integrated circuit Speedup Hardware acceleration Lookup table Graphics processing unit MNIST database
DOI: 10.3390/technologies8010006 Publication Date: 2020-01-15T08:20:22Z
ABSTRACT
The workloads of Convolutional Neural Networks (CNNs) exhibit a streaming nature that makes them attractive for reconfigurable architectures such as the Field-Programmable Gate Arrays (FPGAs), while their increased need low-power and speed has established Application-Specific Integrated Circuit (ASIC)-based accelerators alternative efficient solutions. During last five years, development Hardware Description Language (HDL)-based CNN accelerators, either FPGA or ASIC, seen huge academic interest due to high-performance room optimizations. Towards this direction, we propose library-based framework, which extends TensorFlow, well-established machine learning automatically generates high-throughput inference engines FPGAs ASICs. framework allows software developers exploit benefits FPGA/ASIC acceleration without requiring any expertise on HDL low-level design. Moreover, it provides set optimization knobs concerning model architecture engine generation, allowing developer tune accelerator according requirements respective use case. Our is evaluated by optimizing LeNet MNIST dataset, implementing FPGA- ASIC-based using generated engine. optimal FPGA-based Zynq-7000 delivers 93% less memory footprint 54% Look-Up Table (LUT) utilization, up 10× speedup execution vs. different Graphics Processing Unit (GPU) Central (CPU) implementations same model, in exchange negligible accuracy loss, i.e., 0.89%. For drop, 45 nm standard-cell-based ASIC an implementation operates at 520 MHz occupies area 0.059 mm 2 , power consumption ∼7.5 mW.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (42)
CITATIONS (8)