Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computation and Language (cs.CL) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2502.13061 Publication Date: 2025-02-18
ABSTRACT
Hateful memes have become a significant concern on the Internet, necessitating robust automated detection systems. While large multimodal models shown strong generalization across various tasks, they exhibit poor to hateful meme due dynamic nature of tied emerging social trends and breaking news. Recent work further highlights limitations conventional supervised fine-tuning for in this context. To address these challenges, we propose Large Multimodal Model Retrieval-Guided Contrastive Learning (LMM-RGCL), novel two-stage framework designed improve both in-domain accuracy cross-domain generalization. Experimental results six widely used classification datasets demonstrate that LMM-RGCL achieves state-of-the-art performance, outperforming agent-based systems such as VPD-PALI-X-55B. Furthermore, our method effectively generalizes out-of-domain under low-resource settings, surpassing like GPT-4o.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....