Automated ICD coding for primary diagnosis via clinically interpretable machine learning

Interpretability Feature Engineering Hyperparameter
DOI: 10.1016/j.ijmedinf.2021.104543 Publication Date: 2021-07-27T08:45:08Z
ABSTRACT
Computer-assisted clinical coding (CAC) based on automated algorithms has been expected to improve the International Classification of Disease, tenth version (ICD-10) quality and productivity, whereas studies oriented primary diagnosis auto-coding are limited in Chinese context. This study aims at developing a machine learning (ML) model for ICD-10 coding. A total 71,709 admissions Fuwai hospital were included carry out this study, corresponding 168 codes. Based implications, two feature engineering methods used process discharge procedure texts into sequential features grouping respectively by which kinds models built compared. One baseline using one-hot encoding was considered. Light Gradient Boosting Machine (LightGBM) adopted as classifier, grid search cross-validation select optimal hyperparameters. SHapley Additive exPlanations (SHAP) values applied give interpretability models. Our best prediction developed features. It showed good performance test phase with accuracy macro-averaged F1 (Macro-F1) 95.2% 88.3% respectively. The comparison demonstrated effectiveness information strategy boosting (P-value < 0.01). Subgroup analysis each individual code manifested that 91.1% codes achieved over 70.0%. its context results interpretable. Hence, it potential assist coders efficiency inpatient settings.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (51)
CITATIONS (34)