NFDI4DS | UHH-SEMS - Publication Details

Cluster-oriented instance selection for classification problems

instance selection algorithm 0202 electrical engineering, electronic engineering, information engineering data reduction 006 02 engineering and technology classification problems

DOI: 10.1016/j.ins.2022.04.036 Publication Date: 2022-04-21T15:11:12Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

Soumitra Saha

Partho Sarathi Sa...

Alam Al Saud

Swakkhar Shatabda

M.A. Hakim Newton

ABSTRACT

More training instances could lead to better classification accuracy. However, accuracy could also degrade if more training instances mean further noises and outliers. Additional training instances arguably need additional computational resources in future data mining operations. Instance selection algorithms identify subsets of training instances that could desirably increase accuracy or at least do not decrease accuracy significantly. There exist many instance selection algorithms, but no single algorithm, in general, dominates the others. Moreover, existing instance selection algorithms do not allow direct controlling of the instance selection rate. In this paper, we present a simple and generic cluster-oriented instance selection algorithm for classification problems. Our proposed algorithm runs an unsupervised K Means Clustering algorithm on the training instances and with a given selection rate, selects instances from the centers and the borders of the clusters. On 24 benchmark classification problems, when very similar percentages of instances are selected by various instance selection algorithms, K Nearest Neighbours classifiers achieve more than 2%–3% better accuracy when using instances selected by our proposed method than when using those selected by other state-of-the-art generic instance selection algorithms.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (50)

CITATIONS (30)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Cluster-oriented instance selection for classification problems

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....