Consensus features nested cross-validation

Machine Learning 0301 basic medicine Depressive Disorder, Major 0303 health sciences 03 medical and health sciences Consensus Research Design Humans Reproducibility of Results
DOI: 10.1093/bioinformatics/btaa046 Publication Date: 2020-01-20T20:10:46Z
ABSTRACT
Abstract Summary Feature selection can improve the accuracy of machine-learning models, but appropriate steps must be taken to avoid overfitting. Nested cross-validation (nCV) is a common approach that chooses classification model and features represent given outer fold based on give maximum inner-fold accuracy. Differential privacy related technique overfitting uses privacy-preserving noise mechanism identify are stable between training holdout sets. We develop consensus nested (cnCV) combines idea feature stability from differential with nCV. applied in each inner top across folds used as measure or reliability instead accuracy, which standard use simulated data main effects, correlation interactions compare performance new cnCV nCV, Elastic Net optimized by cross-validation, private evaporative cooling (pEC). also these methods using real RNA-seq study major depressive disorder. The method has similar validation much shorter run times because it does not construct classifiers folds. more parsimonious set fewer false positives than pEC selects without need specify threshold. show an effective efficient for combining classification. Availability implementation Code available at https://github.com/insilico/cncv. Supplementary information Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (26)
CITATIONS (151)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....