Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Automated reasoning
DOI: 10.48550/arxiv.2401.06805 Publication Date: 2024-01-01
ABSTRACT
Strong Artificial Intelligence (Strong AI) or General (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along emerging field Multimodal (MLLMs), have demonstrated impressive capabilities across a wide range multimodal tasks and applications. Particularly, various MLLMs, each distinct model architectures, training data, stages, been evaluated broad MLLM benchmarks. These studies have, to varying degrees, revealed different aspects current MLLMs. However, abilities MLLMs not systematically investigated. In this survey, we comprehensively review existing evaluation protocols reasoning, categorize illustrate frontiers introduce recent trends applications on reasoning-intensive tasks, finally discuss practices future directions. We believe our survey establishes solid base sheds light important topic, reasoning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()