BAT: Boundary aware transducer for memory-efficient and low-latency ASR
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.21437/interspeech.2023-770
Publication Date:
2023-08-14T08:22:20Z
AUTHORS (3)
ABSTRACT
accepted into INTERSPEECH2023<br/>Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well as superior performance. Nevertheless, RNN-T training requires large time and computation resources as RNN-T loss calculation is slow and consumes a lot of memory. Another limitation of RNN-T is that it tends to access more contexts for better performance, thus leading to higher emission latency in streaming ASR. In this paper we propose boundary-aware transducer (BAT) for memory-efficient and low-latency ASR. In BAT, the lattice for RNN-T loss computation is reduced to a restricted region selected by the alignment from continuous integrate-and-fire (CIF), which is jointly optimized with the RNN-T model. Extensive experiments demonstrate that compared to RNN-T, BAT reduces time and memory consumption significantly in training, and achieves good CER-latency trade-offs in inference for streaming ASR.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....