NFDI4DS | UHH-SEMS - Publication Details

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Benchmark (surveying)

DOI: 10.48550/arxiv.2311.16103 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Munan Ning

Bin Zhu

Yujia Xie

Bin Lin

Jiaxi Cui

Lu Yuan

Dongdong Chen

Yuan Li

ABSTRACT

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, a diverse range of user inquiries. In pursuit the ultimate goal achieving artificial general intelligence, truly intelligent Video-LLM model should not only see understand surroundings, but also possess human-level commonsense, make well-informed decisions for users. To guide development such model, establishment robust comprehensive evaluation system becomes crucial. this end, paper proposes \textit{Video-Bench}, new benchmark along with toolkit specifically designed evaluating Video-LLMs. The comprises 10 meticulously crafted tasks, capabilities Video-LLMs across three distinct levels: Video-exclusive Understanding, Prior Knowledge-based Question-Answering, Comprehension Decision-making. addition, we introduce an automatic tailored to process outputs various facilitating calculation metrics generating convenient final scores. We evaluate 8 representative using \textit{Video-Bench}. findings reveal that current still fall considerably short human-like comprehension analysis real-world videos, offering valuable insights future research directions. are available at: \url{https://github.com/PKU-YuanGroup/Video-Bench}.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....