Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Benchmark (surveying)
DOI:
10.48550/arxiv.2311.16103
Publication Date:
2023-01-01
AUTHORS (8)
ABSTRACT
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, a diverse range of user inquiries. In pursuit the ultimate goal achieving artificial general intelligence, truly intelligent Video-LLM model should not only see understand surroundings, but also possess human-level commonsense, make well-informed decisions for users. To guide development such model, establishment robust comprehensive evaluation system becomes crucial. this end, paper proposes \textit{Video-Bench}, new benchmark along with toolkit specifically designed evaluating Video-LLMs. The comprises 10 meticulously crafted tasks, capabilities Video-LLMs across three distinct levels: Video-exclusive Understanding, Prior Knowledge-based Question-Answering, Comprehension Decision-making. addition, we introduce an automatic tailored to process outputs various facilitating calculation metrics generating convenient final scores. We evaluate 8 representative using \textit{Video-Bench}. findings reveal that current still fall considerably short human-like comprehension analysis real-world videos, offering valuable insights future research directions. are available at: \url{https://github.com/PKU-YuanGroup/Video-Bench}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....