NFDI4DS | UHH-SEMS - Publication Details

predicting class imbalanced business risk using resampling regularization and model emsembling algorithms

FOS: Computer and information sciences Computer Science - Machine Learning Statistics - Machine Learning Imbalance, resampling, regularization, ensemble, risk modeling 0202 electrical engineering, electronic engineering, information engineering Machine Learning (stat.ML) 02 engineering and technology Machine Learning (cs.LG)

DOI: 10.5281/zenodo.2583550 Publication Date: 2019-01-01

Abstract Supplemental Material References Cited by

AUTHORS (2)

Xuelei Sherry Ni

Yan Wang

ABSTRACT

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

predicting class imbalanced business risk using resampling regularization and model emsembling algorithms

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....