Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology
Augment
DOI:
10.48550/arxiv.2409.13902
Publication Date:
2024-09-20
AUTHORS (22)
ABSTRACT
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG downstream domain-specific applications. We developed a pipeline with 70,000 ophthalmology-specific documents that retrieve relevant augment LLMs during inference time. In case study long-form consumer health questions, we systematically including over 500 references without 100 questions 10 healthcare professionals. The evaluation focuses factuality evidence, selection ranking attribution answer accuracy completeness. provided 252 total. Of which, 45.3% hallucinated, 34.1% consisted minor errors, 20.6% were correct. contrast, significantly improved (54.5% being correct) reduced error rates (18.8% hallucinations 26.7% errors). 62.5% top retrieved by selected as LLM response, an average 4.9. use also (increasing from 1.85 2.49 5-point scale, P<0.001), albeit slight decreases (from 3.52 3.23, P=0.03) completeness 3.47 3.27, P=0.17). results demonstrate frequently exhibited erroneous responses, raising concerns for applications medical domain. substantially proportion such but encountered challenges.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....