NFDI4DS | UHH-SEMS - Publication Details

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

FOS: Computer and information sciences Computer Science - Machine Learning Sound (cs.SD) Computer Science - Computation and Language Machine Learning (stat.ML) Computer Science - Sound Machine Learning (cs.LG) 03 medical and health sciences Statistics - Machine Learning Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 0305 other medical science Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.21437/interspeech.2019-2680 Publication Date: 2019-09-13T20:32:51Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

Daniel S. Park

William Chan

Yu Zhang

Chung-Cheng Chiu

Barret Zoph

Ekin D. Cubuk

Quoc V. Le

ABSTRACT

5 pages, 3 figures, 6 tables; v3: references added<br/>We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (1743)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....