PhD: A Prompted Visual Hallucination Evaluation Dataset
FOS: Computer and information sciences
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
DOI:
10.48550/arxiv.2403.11116
Publication Date:
2024-03-17
AUTHORS (8)
ABSTRACT
The rapid growth of Large Language Models (LLMs) has driven the development Vision-Language (LVLMs). challenge hallucination, prevalent in LLMs, also emerges LVLMs. However, most existing efforts mainly focus on object hallucination LVLM, ignoring diverse types LVLM hallucinations. In this study, we delve into Intrinsic Hallucination (IVL-Hallu) issue, thoroughly analyzing different IVL-Hallu their causes and reflections. Specifically, propose several novel tasks categorize them four types: (a) which arises from misidentification objects, (b) attribute is caused by attributes, (c) multi-modal conflicting derives contradictions between textual visual information, (d) counter-common-sense owes to knowledge actual images. Based these taxonomies, a more challenging benchmark named PhD evaluate explore IVL-Hallu. An automated pipeline proposed for generating data. Extensive experiments five SOTA LVLMs reveal inability effectively tackle our tasks, with detailed analyses insights origins possible solutions new facilitating future researches LVLM. can be accessed at https://github.com/jiazhen-code/IntrinsicHallu
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....