Multi-Interaction Modeling with Intelligent Coordination for Multimodal Emotion Recognition

Multimodal Interaction Multimodal therapy
DOI: 10.20944/preprints202505.1219.v1 Publication Date: 2025-05-19T02:18:33Z
ABSTRACT
Emotion recognition through multimodal signals—such as speech, text, and facial cues—has garnered increasing attention due to its pivotal role in enhancing human-computer interaction intelligent communication systems. However, existing approaches often struggle thoroughly capture the intricacies of interactions, primarily challenges effectively fusing heterogeneous modalities while mitigating redundancy preserving complementary information. In this study, we introduce \textbf{MIMIC}, a novel framework designed comprehensively model complex interactions from diverse perspectives. Specifically, MIMIC introduces three parallel latent representations: modality-preserving full representation, cross-modal shared individualized modality-specific representations. Furthermore, hierarchical semantic-driven fusion strategy is proposed seamlessly integrate these representations into cohesive space. Extensive experiments demonstrate that our not only surpasses prior state-of-the-art methods but also achieves with remarkable efficiency, involving lower computational complexity significantly fewer trainable parameters. Our contributions are twofold: (1) advancing multi-perspective modeling approach enhances depth emotion analysis, (2) offering streamlined, resource-efficient suitable for practical deployments emotion-aware
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....