RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models

Benchmark (surveying)
DOI: 10.48550/arxiv.2405.14486 Publication Date: 2024-05-23
ABSTRACT
Large Language Models (LLMs) have shown impressive capabilities but also a concerning tendency to hallucinate. This paper presents RefChecker, framework that introduces claim-triplets represent claims in LLM responses, aiming detect fine-grained hallucinations. In an extractor generates from response, which are then evaluated by checker against reference. We delineate three task settings: Zero, Noisy and Accurate Context, reflect various real-world use cases. curated benchmark spanning NLP tasks annotated 11k 2.1k responses seven LLMs. RefChecker supports both proprietary open-source models as the checker. Experiments demonstrate enable superior hallucination detection, compared other granularities such sentence sub-sentence level claims. outperforms prior methods 6.8 26.1 points on our checking results of strongly aligned with human judgments. work is open sourced at https://github.com/amazon-science/RefChecker
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....