Transformer based unsupervised pre-training for acoustic representation learning

Speech translation Training set Representation
DOI: 10.48550/arxiv.2007.14602 Publication Date: 2020-01-01
ABSTRACT
Recently, a variety of acoustic tasks and related applications arised. For many tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn general robust high-level representation for all tasks. Experiments have been conducted on three kinds tasks: speech emotion recognition, sound event detection translation. All experiments shown that its own training can significantly improve performance. With larger combining MuST-C, Librispeech ESC-US datasets, UAR further absolutely 4.3% IEMOCAP dataset. detection, F1 score 1.5% DCASE2018 task5 development set 2.1% evaluation set. translation, BLEU relatively 12.2% En-De dataset 8.4% En-Fr
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....