CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
Application-specific integrated circuit
Hardware acceleration
Symmetric multiprocessor system
Speedup
DOI:
10.1145/3686163
Publication Date:
2024-08-05T15:51:50Z
AUTHORS (14)
ABSTRACT
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with high computation demands these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged promising platforms. For example, AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores programmable logic AI Engine processors optimized for AI/ML. An array 400 executing at 1 GHz can provide up to 6.4 TFLOPS performance 32-bit floating-point (FP32) data. However, machine models often contain large small MM operations. While operations be parallelized efficiently across many cores, typically cannot. We observe that some layers from BERT natural language processing model on a large, monolithic accelerator achieved less than 5% theoretical peak performance. Therefore, key question arises: How we design fully use abundant resources under limited communication bandwidth end-to-end applications multiple diverse sizes? identify biggest system throughput bottleneck resulting mismatch between massive various sizes application. resolve this problem, propose CHARM framework compose working concurrently different within includes analytical guide space exploration determine partitions layer scheduling. facilitate designs, automatically generates code, enabling thorough onboard verification. deploy four FP32, INT16, INT8 data types, including BERT, ViT, NCF, MLP, VCK190 evaluation board. Our experiments show achieve 1.46 TFLOPS, 1.61 1.74 2.94 inference MLP FP32 type, respectively, which obtain 5.29 \(\times\) , 32.51 1.00 gains compared accelerator. achieves maximum 1.91 TOPS, 1.18 4.06 5.81 TOPS INT16 type The by is 3.65 1.28 10.19 21.58 respectively. open-sourced our tools, detailed step-by-step guides reproduce all results presented article enable other users learn leverage tools their systems: https://github.com/arc-research-lab/CHARM .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (54)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....