A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization
FOS: Computer and information sciences
Computer Science - Machine Learning
Extreme gradient boosting
Machine Learning (stat.ML)
02 engineering and technology
Machine Learning (cs.LG)
feature selection
Statistics - Machine Learning
0202 electrical engineering, electronic engineering, information engineering
XGBoost
DOI:
10.48550/arxiv.1901.08433
Publication Date:
2019-01-01
AUTHORS (2)
ABSTRACT
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, Chi-square, hierarchical variable clustering, correlation, information applied alleviate effect of redundant features. Two optimization approaches, random search (RS) Bayesian tree-structured Parzen Estimator (TPE), in XGBoost. different performance investigated Wilcoxon Signed Rank Test. XGBoost is compared traditionally utilized logistic regression (LR) terms classification accuracy, area under curve (AUC), recall, F1 score obtained from 10-fold cross validation. Results show that clustering optimal method LR while Chi-square achieves best XG-Boost. Both TPE RS outperform significantly. shows a superiority over since it results significantly higher accuracy marginally AUC, recall score. Furthermore, with tuning lower variability than method. Finally, ranking feature importance enhances interpretation. Therefore, serves as an operative powerful modeling.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....