Deformable TDNN with adaptive receptive fields for speech recognition

FOS: Computer and information sciences Sound (cs.SD) 03 medical and health sciences Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 0305 other medical science Computation and Language (cs.CL) Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
DOI: 10.48550/arxiv.2104.14791 Publication Date: 2021-01-01
ABSTRACT
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs limited fixed, which is not desirable for tasks like recognition, where temporal dynamics varied affected by many factors. This paper proposes to use deformable adaptive modeling recognition. Inspired ConvNets, augment sampling locations with additional offsets learn automatically on ASR criterion, without supervision. Experiments show that obtain state-of-the-art results WSJ benchmarks (1.42\%/3.45\% WER eval92/dev93 respectively), outperforming standard significantly. Furthermore, we propose latency control mechanism TDNNs, enables do streaming accuracy degradation.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....