NFDI4DS | UHH-SEMS - Publication Details

Degrees of freedom and model selection for k-means clustering

FOS: Computer and information sciences Computer Science - Machine Learning Statistics - Machine Learning Machine Learning (stat.ML) 0101 mathematics 01 natural sciences Machine Learning (cs.LG)

DOI: 10.1016/j.csda.2020.106974 Publication Date: 2020-04-13T15:01:59Z

Abstract Supplemental Material References Cited by

AUTHORS (1)

David P. Hofmeyr

ABSTRACT

This paper investigates the model degrees of freedom in k-means clustering. An extension of Stein's lemma provides an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of our proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/edfkmeans.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (22)

CITATIONS (12)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Degrees of freedom and model selection for k-means clustering

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....