Exploring the use of a Large Language Model for data extraction in systematic reviews: a rapid feasibility study
Data extraction
DOI:
10.48550/arxiv.2405.14445
Publication Date:
2024-05-23
AUTHORS (9)
ABSTRACT
This paper describes a rapid feasibility study of using GPT-4, large language model (LLM), to (semi)automate data extraction in systematic reviews. Despite the recent surge interest LLMs there is still lack understanding how design LLM-based automation tools and robustly evaluate their performance. During 2023 Evidence Synthesis Hackathon we conducted two studies. Firstly, automatically extract characteristics from human clinical, animal, social science domain We used studies each category for prompt-development; ten evaluation. Secondly, LLM predict Participants, Interventions, Controls Outcomes (PICOs) labelled within 100 abstracts EBM-NLP dataset. Overall, results indicated an accuracy around 80%, with some variability between domains (82% 80% 72% sciences). Causal inference methods were items most errors. In PICO study, participants intervention/control showed high (>80%), outcomes more challenging. Evaluation was done manually; scoring such as BLEU ROUGE limited value. observed predictions changes response quality. presents template future evaluations context review automation. Our show that might be value LLMs, example second or third reviewers. However, caution advised when integrating models GPT-4 into tools. Further research on stability reliability practical settings warranted type processed by LLM.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....