Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
Computation and Language (cs.CL)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2502.13061
Publication Date:
2025-02-18
AUTHORS (5)
ABSTRACT
Hateful memes have become a significant concern on the Internet, necessitating robust automated detection systems. While large multimodal models shown strong generalization across various tasks, they exhibit poor to hateful meme due dynamic nature of tied emerging social trends and breaking news. Recent work further highlights limitations conventional supervised fine-tuning for in this context. To address these challenges, we propose Large Multimodal Model Retrieval-Guided Contrastive Learning (LMM-RGCL), novel two-stage framework designed improve both in-domain accuracy cross-domain generalization. Experimental results six widely used classification datasets demonstrate that LMM-RGCL achieves state-of-the-art performance, outperforming agent-based systems such as VPD-PALI-X-55B. Furthermore, our method effectively generalizes out-of-domain under low-resource settings, surpassing like GPT-4o.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....