NFDI4DS | UHH-SEMS - Publication Details

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition

FOS: Computer and information sciences Sound (cs.SD) Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.21437/interspeech.2021-415 Publication Date: 2021-08-27T01:59:39Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xiong Wang

Sining Sun

Lei Xie

Long Ma

ABSTRACT

End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superior performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer to sparse the computing process of self-attention in order to accelerate inference speed and reduce space consumption. Specifically, we adopt a Kullback-Leibler divergence based sparsity measurement for each query to decide whether we compute the attention function on this query. By using the prob-sparse attention mechanism, we achieve impressively 8% to 45% inference speed-up and 15% to 45% memory usage reduction of the self-attention module of Conformer Transducer while maintaining the same level of error rate.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (9)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....