NFDI4DS | UHH-SEMS - Publication Details

Deformable TDNN with adaptive receptive fields for speech recognition

FOS: Computer and information sciences Sound (cs.SD) 03 medical and health sciences Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 0305 other medical science Computation and Language (cs.CL) Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.48550/arxiv.2104.14791 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Keyu An

Yi Zhang

Zhijian Ou

ABSTRACT

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs limited fixed, which is not desirable for tasks like recognition, where temporal dynamics varied affected by many factors. This paper proposes to use deformable adaptive modeling recognition. Inspired ConvNets, augment sampling locations with additional offsets learn automatically on ASR criterion, without supervision. Experiments show that obtain state-of-the-art results WSJ benchmarks (1.42\%/3.45\% WER eval92/dev93 respectively), outperforming standard significantly. Furthermore, we propose latency control mechanism TDNNs, enables do streaming accuracy degradation.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Deformable TDNN with adaptive receptive fields for speech recognition

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....