Permutation importance: a corrected feature importance measure

Interpretability Categorical variable Feature (linguistics) Relevance Feature vector
DOI: 10.1093/bioinformatics/btq134 Publication Date: 2010-04-13T02:32:32Z
ABSTRACT
Abstract Motivation: In life sciences, interpretability of machine learning models is as important their prediction accuracy. Linear are probably the most frequently used methods for assessing feature relevance, despite relative inflexibility. However, in past years effective estimators relevance have been derived highly complex or non-parametric such support vector machines and RandomForest (RF) models. Recently, it has observed that RF biased a way categorical variables with large number categories preferred. Results: this work, we introduce heuristic normalizing importance measures can correct bias. The method based on repeated permutations outcome estimating distribution measured each variable non-informative setting. P-value provides corrected measure importance. We apply our to simulated data demonstrate (i) predictors do not receive significant P-values, (ii) informative successfully be recovered among (iii) P-values computed permutation (PIMP) very helpful deciding significance variables, therefore improve model interpretability. Furthermore, PIMP was RF-based two real-world case studies. propose an improved uses respect show its accuracy superior other existing Availability: R code presented article available at http://www.mpi-inf.mpg.de/∼altmann/download/PIMP.R Contact: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de Supplementary information: Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (16)
CITATIONS (1804)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....