Finding the best diversity generation procedures for mining contrast patterns

Categorical variable Tree (set theory)
DOI: 10.1016/j.eswa.2015.02.028 Publication Date: 2015-02-26T14:20:46Z
ABSTRACT
Comparison of diversity generation procedures for mining contrast patterns.Diversity calculated based on the amount of total, unique, and minimal patterns.Three new deterministic methods for generating diversity in decision trees.Study of the influence of data type in diversity and accuracy of methods.Random Forest and Bagging are the best procedures. Most understandable classifiers are based on contrast patterns, which can be accurately mined from decision trees. Nevertheless, tree diversity must be ensured to mine a representative pattern collection. In this paper, we performed an experimental comparison among different diversity generation procedures. We compare diversity generated by each procedure based on the amount of total, unique, and minimal patterns extracted from the induced tree for different minimal support thresholds. This comparison, together with an accuracy and abstention experiment, shows that Random Forest and Bagging generate the most diverse and accurate pattern collection. Additionally, we study the influence of data type in the results, finding that Random Forest is best for categorical data and Bagging for numerical data. Comparison includes most known diversity generation procedures and three new deterministic procedures introduced here. These deterministic procedures outperform existing deterministic method, but are still outperformed by random procedures.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (32)
CITATIONS (26)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....