NFDI4DS | UHH-SEMS - Publication Details

Large Language Models in the Clinic: A Comprehensive Benchmark

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2405.00716 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (19)

Liu, Fenglin

Li, Zheng

Zhou, Hongjian

Yin, Qingyu

Yang, Jingfeng

Tang, Xianfeng

Luo, Chen

Zeng, Ming

Jiang, Haoming

Gao, Yifan

Nigam, Priyanka

Nag, Sreyashi

Yin, Bing

Hua, Yining

Zhou, Xuan

Rohanian, Omid

Thakur, Anshul

Clifton, Lei

Clifton, David A.

ABSTRACT

The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs. The benchmark data is available at https://github.com/AI-in-Health/ClinicBench.<br/>Accepted at EMNLP 2024 Main Conference<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Large Language Models in the Clinic: A Comprehensive Benchmark

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....