NFDI4DS | UHH-SEMS - Publication Details

Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis

Modalities Benchmark (surveying) Sentiment Analysis

DOI: 10.1007/s10462-023-10685-z Publication Date: 2024-03-01T09:02:18Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

You Li

Han Ding

Yuming Lin

Xinyu Feng

Liang Chang

ABSTRACT

Abstract Multimodal Aspect-Based Sentiment Analysis (MABSA) is an essential task in sentiment analysis that has garnered considerable attention recent years. Typical approaches MABSA often utilize cross-modal Transformers to capture interactions between textual and visual modalities. However, bridging the semantic gap modalities spaces addressing interference from irrelevant objects at different scales remains challenging. To tackle these limitations, we present Multi-level Textual-Visual Alignment Fusion Network (MTVAF) this work, which incorporates three auxiliary tasks. Specifically, MTVAF first transforms multi-level image information into descriptions, facial optical characters. These are then concatenated with input form a textual+visual input, facilitating comprehensive alignment Next, both inputs fed integrated text model relevant representations. Dynamic mechanisms employed generate prompts control fusion. Finally, align probability distributions of space space, effectively reducing noise introduced during process. Experimental results on two benchmark datasets demonstrate effectiveness proposed MTVAF, showcasing its superior performance compared state-of-the-art approaches. Our codes available https://github.com/MKMaS-GUET/MTVAF .

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (58)

CITATIONS (14)

EXTERNAL LINKS

CROSSREF - Publications OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....