Cluster-oriented instance selection for classification problems

instance selection algorithm 0202 electrical engineering, electronic engineering, information engineering data reduction 006 02 engineering and technology classification problems
DOI: 10.1016/j.ins.2022.04.036 Publication Date: 2022-04-21T15:11:12Z
ABSTRACT
More training instances could lead to better classification accuracy. However, accuracy could also degrade if more training instances mean further noises and outliers. Additional training instances arguably need additional computational resources in future data mining operations. Instance selection algorithms identify subsets of training instances that could desirably increase accuracy or at least do not decrease accuracy significantly. There exist many instance selection algorithms, but no single algorithm, in general, dominates the others. Moreover, existing instance selection algorithms do not allow direct controlling of the instance selection rate. In this paper, we present a simple and generic cluster-oriented instance selection algorithm for classification problems. Our proposed algorithm runs an unsupervised K Means Clustering algorithm on the training instances and with a given selection rate, selects instances from the centers and the borders of the clusters. On 24 benchmark classification problems, when very similar percentages of instances are selected by various instance selection algorithms, K Nearest Neighbours classifiers achieve more than 2%–3% better accuracy when using instances selected by our proposed method than when using those selected by other state-of-the-art generic instance selection algorithms.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (50)
CITATIONS (30)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....