MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
Benchmark (surveying)
DOI:
10.48550/arxiv.2502.11903
Publication Date:
2025-02-17
AUTHORS (16)
ABSTRACT
Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, reason sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core of MLLMs: information extraction, multi-turn reasoning, update, image management, memory answer refusal. With data collected from scenarios, MMRC comprises 5,120 conversations 28,720 corresponding manually labeled questions, posing challenge existing MLLMs. Evaluations on 20 MLLMs indicate an accuracy drop during interactions. We identify four common failure patterns: long-term degradation, inadequacies updating factual knowledge, accumulated assumption error propagation, reluctance say no. To mitigate these issues, we propose simple yet effective NOTE-TAKING strategy, which can record key the conversation remind model its responses, enhancing conversational capabilities. Experiments across demonstrate performance improvements.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....