NFDI4DS | UHH-SEMS - Publication Details

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Benchmark (surveying)

DOI: 10.48550/arxiv.2502.11903 Publication Date: 2025-02-17

Abstract Supplemental Material References Cited by

AUTHORS (16)

Haochen Xue

Feilong Tang

Ming-Che Hu

Yexin Liu

Qidong Huang

Yulong Li

Chengzhi Liu

Zhongxing Xu

Chong Zhang

Chun-Mei Feng

Yutong Xie

Imran Razzak

Zongyuan Ge

Jionglong Su

Junjun He

Yu Qiao

ABSTRACT

Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, reason sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core of MLLMs: information extraction, multi-turn reasoning, update, image management, memory answer refusal. With data collected from scenarios, MMRC comprises 5,120 conversations 28,720 corresponding manually labeled questions, posing challenge existing MLLMs. Evaluations on 20 MLLMs indicate an accuracy drop during interactions. We identify four common failure patterns: long-term degradation, inadequacies updating factual knowledge, accumulated assumption error propagation, reluctance say no. To mitigate these issues, we propose simple yet effective NOTE-TAKING strategy, which can record key the conversation remind model its responses, enhancing conversational capabilities. Experiments across demonstrate performance improvements.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....