Acoustic Feature Excitation-and-Aggregation Network Based on Multi-Task Learning for Speech Emotion Recognition
Feature (linguistics)
DOI:
10.3390/electronics14050844
Publication Date:
2025-02-21T12:53:06Z
AUTHORS (5)
ABSTRACT
In recent years, substantial research has focused on emotion recognition using multi-stream speech representations. existing (SER) approaches, effectively extracting and fusing features is crucial. To overcome the bottleneck in SER caused by fusion of inter-feature information, including challenges like modeling complex feature relations inefficiency methods, this paper proposes an framework based multi-task learning, named AFEA-Net. The consists a alignment learning (SEAL), acoustic excitation-and-aggregation mechanism (AFEA), continuity learning. First, SEAL aligns sentiment information between WavLM Fbank features. Then, we design to adaptively calibrate merge two Furthermore, introduce strategy explore distinctiveness complementarity dual-stream from intra- inter-speech. Experimental results publicly available IEMOCAP RAVDESS datasets show that our proposed approach outperforms state-of-the-art approaches. Specifically, achieve 75.1% WA, 75.3% UAR, 76% precision, 75.4% F1-score IEMOCAP, 80.3%, 80.6%, 80.8%, 80.4% RAVDESS, respectively.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (69)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....