NFDI4DS | UHH-SEMS - Publication Details

Accelerating Sparse DNNs Based on Tiled GEMM

DOI: 10.48550/arxiv.2402.10876 Publication Date: 2024-02-16

Abstract Supplemental Material References Cited by

AUTHORS (8)

Cong Guo

Fengchen Xue

Jingwen Leng

Yuxian Qiu

Yue Guan

Weihao Cui

Quan Chen

Minyi Guo

ABSTRACT

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading irregular computations. Consequently, unstructured cannot achieve meaningful speedup on commodity hardware built for dense matrix Accelerators are usually modified or designed with structured sparsity-optimized architectures exploiting sparsity. For example, Ampere architecture introduces a tensor core, which adopts 2:4 sparsity pattern. We propose method that builds upon insight multiplication generally breaks large into multiple smaller tiles parallel execution. present tile-wise pattern, maintains pattern at tile level efficient execution but allows global scale high accuracy. In addition, is implemented memory level, and executes register inside core. combine these two patterns tile-vector-wise (TVW) explore more fine-grained further accelerate DNN evaluate TVW GPU, achieving averages $1.85\times$, $2.75\times$, $22.18\times$ speedups over model, block sparsity,

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Accelerating Sparse DNNs Based on Tiled GEMM

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....