Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

Feature (linguistics) Modalities Sensor Fusion
DOI: 10.3390/electronics12163504 Publication Date: 2023-08-18T13:13:44Z
ABSTRACT
In the real world, multimodal sentiment analysis (MSA) enables capture and of sentiments by fusing information, thereby enhancing understanding real-world environments. The key challenges lie in handling noise acquired data achieving effective fusion. When processing data, existing methods utilize combination features to mitigate errors word recognition caused performance limitations automatic speech (ASR) models. However, there still remains problem how more efficiently combine different modalities address noise. fusion, most fusion have limited adaptability feature differences between modalities, making it difficult potential complex nonlinear interactions that may exist modalities. To overcome aforementioned issues, this paper proposes a new framework named multimodal-word-refinement cross-modal-hierarchy (MWRCMH) Specifically, we utilized correction module reduce ASR. During designed cross-modal hierarchical employed attention mechanisms fuse pairs resulting fused bimodal-feature information. Then, obtained bimodal information unimodal were through layer obtain final Experimental results on MOSI-SpeechBrain, MOSI-IBM, MOSI-iFlytek datasets demonstrated proposed approach outperformed other comparative methods, Has0-F1 scores 76.43%, 80.15%, 81.93%, respectively. Our exhibited better performance, as compared multiple baselines.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (52)
CITATIONS (8)