NFDI4DS | UHH-SEMS - Publication Details

Pre-Finetuning for Few-Shot Emotional Speech Recognition

FOS: Computer and information sciences Computer Science - Machine Learning Sound (cs.SD) Computer Science - Computation and Language 02 engineering and technology Computer Science - Sound Machine Learning (cs.LG) Audio and Speech Processing (eess.AS) 0202 electrical engineering, electronic engineering, information engineering FOS: Electrical engineering, electronic engineering, information engineering Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.48550/arxiv.2302.12921 Publication Date: 2023-08-20

Abstract Supplemental Material References Cited by

AUTHORS (2)

Chen, Maximillian

Yu, Zhou

ABSTRACT

Published at INTERSPEECH 2023. 5 pages, 4 figures. Code available at https://github.com/maxlchen/Speech-PreFinetuning<br/>Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Pre-Finetuning for Few-Shot Emotional Speech Recognition

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....