NFDI4DS | UHH-SEMS - Publication Details

Productively Generating a High-Performance Linear Algebra Library on FPGAs

DOI: 10.1145/3723046 Publication Date: 2025-03-11T16:50:21Z

Abstract Supplemental Material References Cited by

AUTHORS (10)

Xiaochen Hao

Mingzhe Zhang

Ce Sun

Zhuofu Tao

Hongbo Rong

Yu Zhang

Lei He

Eric Petit

Wenguang Chen

Yun Liang

ABSTRACT

Linear algebra computations can be greatly accelerated using spatial accelerators on FPGAs. As a standard building block of linear algebra applications, BLAS covers a wide range of compute patterns that vary vastly in data reuse, bottleneck resources, matrix storage layouts, and data types. However, existing implementations of BLAS routines on FPGAs are stuck in the dilemma of productivity and performance. They either require extensive human effort or fail to leverage the properties of routines for acceleration. We introduce Lasa, a framework composed of a programming model and a compiler, designed to address the dilemma by abstracting (for productivity) and specializing (for performance) the architecture of a spatial accelerator. The programming model realizes systolic arrays using uniform recurrence equations and space-time transforms. Streaming tensors, an intuitive dataflow-style abstraction, is proposed to uniformly describe the movement, storage, and transpose of input and output data across the spatial components. According to streaming tensors, a customized memory hierarchy is automatically built on an FPGA by our compiler. The compiler further specializes the architecture with transparent optimizations on FPGAs. Using this framework, we develop a complete BLAS library, demonstrating performance in parity with expert-written HLS code for BLAS level 3 routines, 76%-94% machine peak for level 1 and 2 routines, and 1.6X-13X speedup by leveraging the matrix properties such as symmetry, triangularity, and bandness.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (55)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications

PlumX Metrics

Productively Generating a High-Performance Linear Algebra Library on FPGAs

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....