Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4
Observational Research
3. Good health
DOI:
10.1007/s00296-023-05464-6
Publication Date:
2023-09-24T06:01:31Z
AUTHORS (4)
ABSTRACT
Abstract Pre-clinical studies suggest that large language models (i.e., ChatGPT) could be used in the diagnostic process to distinguish inflammatory rheumatic (IRD) from other diseases. We therefore aimed assess accuracy of ChatGPT-4 comparison rheumatologists. For analysis, data set Gräf et al. (2022) was used. Previous patient assessments were analyzed using and compared rheumatologists’ assessments. listed correct diagnosis comparable often rheumatologists as top 35% vs 39% ( p = 0.30); well among 3 diagnoses, 60% 55%, 0.38). In IRD-positive cases, provided 71% 62% analysis. Correct 86% (ChatGPT-4) 74% (rheumatologists). non-IRD 15% 27% cases 46% group 45% group. If only first suggestion for considered, correctly classified 58% IRD 56% 0.52). showed a slightly higher overall diagnoses rheumatologist’s assessment. able provide differential relevant number achieved better sensitivity detect IRDs than rheumatologist, at cost lower specificity. The pilot results highlight potential this new technology triage tool IRD.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (15)
CITATIONS (62)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....