NFDI4DS | UHH-SEMS - Publication Details

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Softmax function TOPS Interfacing

DOI: 10.48550/arxiv.2307.03493 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Gamze İslamoğlu

Moritz Scherer

Gianna Paulin

Tim Fischer

Victor J. B. Jung

Angelo Garofalo

Luca Benini

ABSTRACT

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such computer vision audio processing. However, efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture transformers related that targets inference on embedded systems by exploiting 8-bit quantization an innovative softmax implementation operates exclusively integer values. By computing on-the-fly streaming mode, our minimizes data movement energy consumption. ITA achieves competitive efficiency with respect accelerators 16.9 TOPS/W, while outperforming them area 5.93 TOPS/mm$^2$ 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....