Transformer based unsupervised pre-training for acoustic representation learning
Speech translation
Training set
Representation
DOI:
10.48550/arxiv.2007.14602
Publication Date:
2020-01-01
AUTHORS (6)
ABSTRACT
Recently, a variety of acoustic tasks and related applications arised. For many tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn general robust high-level representation for all tasks. Experiments have been conducted on three kinds tasks: speech emotion recognition, sound event detection translation. All experiments shown that its own training can significantly improve performance. With larger combining MuST-C, Librispeech ESC-US datasets, UAR further absolutely 4.3% IEMOCAP dataset. detection, F1 score 1.5% DCASE2018 task5 development set 2.1% evaluation set. translation, BLEU relatively 12.2% En-De dataset 8.4% En-Fr
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....