Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets

Mondrian Data set Similarity (geometry) Predictive modelling Applicability domain
DOI: 10.1021/acs.jcim.7b00159 Publication Date: 2017-06-19T19:32:47Z
ABSTRACT
Conformal prediction has been proposed as a more rigorous way to define confidence compared other application domain concepts that have earlier used for QSAR modeling. One main advantage of such method is it provides region potentially with multiple predicted labels, which contrasts the single valued (regression) or label (classification) output predictions by standard modeling algorithms. Standard conformal might not be suitable imbalanced data sets. Therefore, Mondrian cross-conformal (MCCP) combines inductive cross-fold calibration sets introduced. In this study, MCCP was applied 18 publicly available various imbalance levels varying from 1:10 1:1000 (ratio active/inactive compounds). Our results show in general performed well on bioactivity levels. More importantly, only and regions machine learning methods but also produces valid minority class. addition, compound similarity based nonconformity measure investigated. demonstrate although gives predictions, its efficiency much worse than model dependent metrics.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (36)
CITATIONS (54)