NFDI4DS | UHH-SEMS - Publication Details

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

03 medical and health sciences 0302 clinical medicine Research

DOI: 10.1007/s00345-024-05399-y Publication Date: 2024-12-27T05:29:15Z

Abstract Supplemental Material References Cited by

AUTHORS (10)

Eric J. Robinson

Chunyuan Qiu

Stuart Sands

Mohammad Khan

Shivang Vora

Kenichiro Oshima

Khang Nguyen

L. Andrew DiFronzo

David Rhew

Mark I. Feng

ABSTRACT

Abstract Purpose To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care. Methods Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators. Results Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001). Conclusion Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (26)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....