NFDI4DS | UHH-SEMS - Publication Details

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2404.05904 Publication Date: 2024-04-08

Abstract Supplemental Material References Cited by

AUTHORS (10)

Giwon Hong

Aryo Pradipta Gema

Rohit Saxena

Xiaotang Du

Ping Nie

Yu Zhao

Laura Perez-Beltr...

Max Ryabinin

Xuanli He

Pasquale Minervini

ABSTRACT

Large Language Models (LLMs) have transformed the Natural Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone ``hallucinations'' -- outputs that do not align factual reality or input context. This paper introduces Hallucinations Leaderboard, an open initiative quantitatively measure compare tendency of each model produce hallucinations. The leaderboard uses a comprehensive set benchmarks focusing on different aspects hallucinations, such as factuality faithfulness, across various tasks, including question-answering, summarisation, reading comprehension. Our analysis provides insights into performance models, guiding researchers practitioners in choosing most reliable for applications.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....