NFDI4DS | UHH-SEMS - Publication Details

Transformer based unsupervised pre-training for acoustic representation learning

Speech translation Training set Representation

DOI: 10.48550/arxiv.2007.14602 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Ruixiong Zhang

Haiwei Wu

Wubo Li

Dongwei Jiang

Wei Zou

Xiangang Li

ABSTRACT

Recently, a variety of acoustic tasks and related applications arised. For many tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn general robust high-level representation for all tasks. Experiments have been conducted on three kinds tasks: speech emotion recognition, sound event detection translation. All experiments shown that its own training can significantly improve performance. With larger combining MuST-C, Librispeech ESC-US datasets, UAR further absolutely 4.3% IEMOCAP dataset. detection, F1 score 1.5% DCASE2018 task5 development set 2.1% evaluation set. translation, BLEU relatively 12.2% En-De dataset 8.4% En-Fr

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Transformer based unsupervised pre-training for acoustic representation learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....