MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Benchmark (surveying) Code (set theory)
DOI: 10.48550/arxiv.2308.02490 Publication Date: 2023-01-01
ABSTRACT
We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written the blackboard, reasoning about events and celebrities in news images, explaining visual jokes. Rapid model advancements pose challenges to development. Problems include: (1) How systematically structure evaluate tasks; (2) design metrics work well across question answer types; (3) give insights beyond a simple performance ranking. To this end, we present designed based insight ability solve tasks is often achieved by generalist being able integrate different core vision-language (VL) capabilities. MM-Vet defines 6 VL capabilities 16 integrations of interest derived from capability combination. For metrics, LLM-based evaluator for open-ended outputs. The enables types styles, resulting unified scoring metric. representative providing into LMM system paradigms models. Code data are available at https://github.com/yuweihao/MM-Vet.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....