Performance of large language models (LLMs) in providing prostate cancer information

Patient Education Grade level
DOI: 10.1186/s12894-024-01570-0 Publication Date: 2024-08-23T22:25:48Z
ABSTRACT
The diagnosis and management of prostate cancer (PCa), the second most common in men worldwide, are highly complex. Hence, patients often seek knowledge through additional resources, including AI chatbots such as ChatGPT Google Bard. This study aimed to evaluate performance LLMs providing education on PCa. Common patient questions about PCa were collected from reliable educational websites evaluated for accuracy, comprehensiveness, readability, stability by two independent board-certified urologists, with a third resolving discrepancy. Accuracy was measured 3-point scale, comprehensiveness 5-point Likert readability using Flesch Reading Ease (FRE) score Flesch–Kincaid FK Grade Level. A total 52 general knowledge, diagnosis, treatment, prevention provided three LLMs. Although there no significant difference overall accuracy LLMs, ChatGPT-3.5 demonstrated superiority over other terms (p = 0.018). ChatGPT-4 achieved greater than Bard 0.028). For generated simpler sentences highest FRE (54.7, p < 0.001) lowest reading level (10.2, 0.001). ChatGPT-3.5, generate accurate, comprehensive, easily readable material. These models might not replace healthcare professionals but can assist guidance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (28)
CITATIONS (10)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....