Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Modalities Sentiment Analysis Task Analysis Multimodality Multimodal learning Code (set theory)
DOI: 10.18653/v1/2022.acl-long.152 Publication Date: 2022-06-03T01:34:53Z
ABSTRACT
As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention inrecent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodalalignment or (ii) vision-language models with general pre-training tasks, are inadequate to identify fine-grainedaspects, opinions, their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-LanguagePre-training framework for MABSA (VLP-MABSA), is unified multimodal encoder-decoder architecture all pretrainingand downstream tasks. We further design three types of tasks from language, vision, multimodalmodalities, respectively. Experimental results show that our approach generally outperforms state-of-the-art on subtasks. Further analysis demonstrates effectiveness each task. The source code publicly released at https://github.com/NUSTM/VLP-MABSA.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (56)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....