NFDI4DS | UHH-SEMS - Publication Details

Performance of Retrieval-Augmented-Generation large language models in guideline-concordant PSA testing: A comparative study against junior clinicians (Preprint)

DOI: 10.2196/preprints.78393 Publication Date: 2025-06-03T04:55:07Z

Abstract Supplemental Material References Cited by

AUTHORS (14)

Joshua Yi Min Tung

Quan Le

Jinxuan Yao

Yifei Huang

Daniel Yan Zheng Lim

Gerald Gui Ren Sng

Rachel Shu En Lau

Yu Guang Tan

Kenneth Chen

Kae Jack Tay

Jen Hong Tan

John Shyi Peng Yuen

Christopher Wai S...

Henry Sun Sien Ho

ABSTRACT

BACKGROUND Society guidelines for prostate cancer screening via PSA testing serve to standardize patient care, and are often utilized by trainees, junior staff, or generalist medical practitioners to guide medical decision-making. Adherence to guidelines is a time-consuming and challenging task and rates of inappropriate PSA testing are high. OBJECTIVE This study evaluates a retrieval-augmented generation (RAG) enhanced large language model (LLM), grounded in current EAU and AUA guidelines, to assess its effectiveness in providing guideline-concordant PSA screening recommendations compared to junior clinicians. METHODS A retrieval-augmented generation (RAG) pipeline was developed and used to process a series of 44 fictional case scenarios. Five junior clinicians were tasked to provide PSA testing recommendations for the same scenarios, in closed-book and open-book formats. Answers were compared for accuracy in a binomial fashion. RESULTS The RAG-LLM tool provided guideline-concordant recommendations in 95.5% of case scenarios, compared to junior clinicians, who were correct in 62.3% of scenarios in a closed-book format, and 74.1% of scenarios in an open book format. The difference was statistically significant for both closed-book (p <0.001) and open-book (p <0.001) formats. CONCLUSIONS Use of RAG techniques allows LLMs to integrate complex guidelines into day-to-day medical decision-making. RAG-LLM tools in Urology have the capability to enhance clinical decision-making by providing guideline-concordant recommendations for PSA testing, potentially improving the consistency of healthcare delivery, reducing cognitive load on clinicians, and reducing unnecessary investigations and costs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (5)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications

PlumX Metrics

Performance of Retrieval-Augmented-Generation large language models in guideline-concordant PSA testing: A comparative study against junior clinicians (Preprint)

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....