An interpretable machine learning framework for opioid overdose surveillance from emergency medical services records
Vectorization (mathematics)
tf–idf
Feature (linguistics)
Feature Engineering
DOI:
10.1371/journal.pone.0292170
Publication Date:
2024-01-30T18:28:20Z
AUTHORS (4)
ABSTRACT
The goal of this study is to develop and validate a lightweight, interpretable machine learning (ML) classifier identify opioid overdoses in emergency medical services (EMS) records. We conducted comparative assessment three feature engineering approaches designed for use with unstructured narrative data. Opioid overdose annotations were provided by two harm reduction paramedics supporting annotators trained reliably match expert annotations. Candidate techniques included term frequency-inverse document frequency (TF-IDF), highly performant approach concept vectorization, custom based on the count empirically-identified keywords. Each set was using four model architectures: generalized linear (GLM), Naïve Bayes, neural network, Extreme Gradient Boost (XGBoost). Ensembles models also evaluated. assessed variable importance aid interpretation. Models TF-IDF ranged from AUROC = 0.59 (95% CI: 0.53-0.66) Bayes 0.76 0.71-0.81) network. vectorization features 0.83 0.78-0.88)for 0.89 0.85-0.94) ensemble. most performant, benchmarks ranging 0.92 0.88-0.95) GLM 0.93 0.90-0.96) achieved positive predictive values (PPV) 80 100%, which represent substantial improvements over previously published EMS encounter classifiers. application county data can productively inform local targeted initiatives.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (40)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....