Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study

F1 score Sample (material)
DOI: 10.2196/52095 Publication Date: 2024-05-16T17:30:36Z
ABSTRACT
Background Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs perform specific tasks biomedical and policy contexts are lacking. Objective This study aims evaluate selection techniques improved named entity recognition (NER) a custom set of conflicts interest disclosure statements. Methods A random 200 statements was prepared annotation. All “PERSON” “ORG” entities were identified by each 2 raters, once appropriate agreement established, annotators independently annotated an additional 290 From 490 documents, 2500 stratified samples different ranges drawn. The training subsamples used fine-tune across model architectures (Bidirectional Encoder Representations from Transformers [BERT] Generative Pre-trained Transformer [GPT]) NER, multiple regression assess relationship between (sentences), density (entities per sentence [EPS]), trained performance (F1-score). Additionally, single-predictor threshold possibility diminishing marginal returns increased or density. Results Fine-tuned ranged topline NER F1-score=0.79 F1-score=0.96 architectures. Two-predictor linear statistically significant with R2 ranging 0.6057 0.7896 (all P<.001). EPS number sentences predictors F1-scores all cases ( P<.001), except GPT-2_large model, where not predictor (P=.184). Model thresholds indicate points return measured sentences, point estimates 439 RoBERTa_large 527 GPT-2_large. Likewise, 1.36 1.38. Conclusions Relatively modest sizes can be applied text, should representatively approximate production data. Training quality architecture’s intended use (text generation vs text processing classification) may as, more, important as volume parameter size.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (47)
CITATIONS (9)