NFDI4DS | UHH-SEMS - Publication Details

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

FOS: Computer and information sciences Computer Science - Computation and Language Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2403.08730 Publication Date: 2024-03-13

Abstract Supplemental Material References Cited by

AUTHORS (7)

Renjie Pi

Tianyang Han

Wei Xiong

Jipeng Zhang

Runtao Liu

Rui Pan

Tong Zhang

ABSTRACT

Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards similar to their pretraining corpus, overshadowing the importance of information. We treat this as "preference" for statistics, which hinders model's grounding input. To mitigate issue, we propose Bootstrapped Preference Optimization (BPO), conducts preference learning with datasets containing negative bootstrapped model itself. Specifically, following two strategies: 1) using distorted image inputs MLLM eliciting that contain signified bias; 2) leveraging text-based LLM explicitly inject erroneous but common elements into original response. Those undesirable are paired annotated construct dataset, is subsequently utilized perform learning. Our approach effectively suppresses pretrained bias, enabling enhanced Extensive experimentation demonstrates significant performance improvements across multiple benchmarks, advancing state-of-the-art multimodal conversational systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....