Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments

0301 basic medicine Artificial intelligence Support vector machine Symbolic Regression Robustness (evolution) Pattern recognition (psychology) Gene Agricultural and Biological Sciences 03 medical and health sciences Artificial Intelligence Biochemistry, Genetics and Molecular Biology Microarray Data Analysis and Gene Expression Profiling FOS: Mathematics Genetics Feature Selection Viral Diseases in Livestock and Poultry Binary classification Molecular Biology Biology Life Sciences Discriminative model QA75.5-76.95 Computer science Overlapping analysis Functional genomic Algorithms and Analysis of Algorithms Electronic computers. Computer science Application of Genetic Programming in Machine Learning FOS: Biological sciences Computer Science Physical Sciences Feature selection Animal Science and Zoology Mathematics Random forest
DOI: 10.7717/peerj-cs.562 Publication Date: 2021-06-01T09:46:26Z
ABSTRACT
In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (59)
CITATIONS (13)