A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

FOS: Computer and information sciences Computer Science - Machine Learning Extreme gradient boosting Machine Learning (stat.ML) 02 engineering and technology Machine Learning (cs.LG) feature selection Statistics - Machine Learning 0202 electrical engineering, electronic engineering, information engineering XGBoost
DOI: 10.48550/arxiv.1901.08433 Publication Date: 2019-01-01
ABSTRACT
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, Chi-square, hierarchical variable clustering, correlation, information applied alleviate effect of redundant features. Two optimization approaches, random search (RS) Bayesian tree-structured Parzen Estimator (TPE), in XGBoost. different performance investigated Wilcoxon Signed Rank Test. XGBoost is compared traditionally utilized logistic regression (LR) terms classification accuracy, area under curve (AUC), recall, F1 score obtained from 10-fold cross validation. Results show that clustering optimal method LR while Chi-square achieves best XG-Boost. Both TPE RS outperform significantly. shows a superiority over since it results significantly higher accuracy marginally AUC, recall score. Furthermore, with tuning lower variability than method. Finally, ranking feature importance enhances interpretation. Therefore, serves as an operative powerful modeling.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....