The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2404.05904 Publication Date: 2024-04-08
ABSTRACT
Large Language Models (LLMs) have transformed the Natural Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone ``hallucinations'' -- outputs that do not align factual reality or input context. This paper introduces Hallucinations Leaderboard, an open initiative quantitatively measure compare tendency of each model produce hallucinations. The leaderboard uses a comprehensive set benchmarks focusing on different aspects hallucinations, such as factuality faithfulness, across various tasks, including question-answering, summarisation, reading comprehension. Our analysis provides insights into performance models, guiding researchers practitioners in choosing most reliable for applications.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....