Pre-Finetuning for Few-Shot Emotional Speech Recognition
FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
Computer Science - Computation and Language
02 engineering and technology
Computer Science - Sound
Machine Learning (cs.LG)
Audio and Speech Processing (eess.AS)
0202 electrical engineering, electronic engineering, information engineering
FOS: Electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.21437/interspeech.2023-136
Publication Date:
2023-08-14T04:22:20Z
AUTHORS (2)
ABSTRACT
Speech models have long been known to overfit individual speakers for many classification tasks.This leads poor generalization in settings where the are out-of-domain or out-of-distribution, as is common production environments.We view speaker adaptation a few-shot learning problem and propose investigating transfer approaches inspired by recent success with pre-trained natural language tasks.We pre-finetuning speech on difficult tasks distill knowledge into downstream objectives.We pre-finetune Wav2Vec2.0 every permutation of four multiclass emotional recognition corpora evaluate our pre-finetuned through 33,600 fine-tuning trials Emotional Dataset.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....