NFDI4DS | UHH-SEMS - Publication Details

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2406.08164 Publication Date: 2024-06-12

Abstract Supplemental Material References Cited by

AUTHORS (14)

Irene Huang

Wei Lin

M. Jehanzeb Mirza

Jacob A. Hansen

Sivan Doveh

Victor Ion Butoi

Roei Herzig

Assaf Arbelle

Hilde Kuhene

Trevor Darrel

Chuang Gan

Aude Oliva

Rogério Feris

Leonid Karlinsky

ABSTRACT

Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts crucial question: VLMs effectively tackled CR challenge? We conjecture that existing benchmarks may not adequately push boundaries modern due to reliance on an LLM-only negative text generation pipeline. Consequently, negatives produced either appear as outliers from natural language distribution learned by VLMs' LLM decoders or improbable within corresponding image context. To address these limitations, we introduce ConMe -- compositional benchmark novel data pipeline leveraging produce `hard Q&A'. Through new concept conversing with each other collaboratively expose their weaknesses, our autonomously generates, evaluates, selects challenging questions, establishing robust benchmark, also subsequently validated manually. Our provokes noteworthy, up 33%, decrease performance compared preceding benchmarks, reinstating challenge even for state-of-the-art VLMs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....