Deformable TDNN with adaptive receptive fields for speech recognition
FOS: Computer and information sciences
Sound (cs.SD)
03 medical and health sciences
Computer Science - Computation and Language
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0305 other medical science
Computation and Language (cs.CL)
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.48550/arxiv.2104.14791
Publication Date:
2021-01-01
AUTHORS (3)
ABSTRACT
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs limited fixed, which is not desirable for tasks like recognition, where temporal dynamics varied affected by many factors. This paper proposes to use deformable adaptive modeling recognition. Inspired ConvNets, augment sampling locations with additional offsets learn automatically on ASR criterion, without supervision. Experiments show that obtain state-of-the-art results WSJ benchmarks (1.42\%/3.45\% WER eval92/dev93 respectively), outperforming standard significantly. Furthermore, we propose latency control mechanism TDNNs, enables do streaming accuracy degradation.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....