NFDI4DS | UHH-SEMS - Publication Details

Revisiting agglomerative clustering

FOS: Computer and information sciences Computer Science - Machine Learning Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (stat.ML) 02 engineering and technology 01 natural sciences Machine Learning (cs.LG) Statistics - Machine Learning 0202 electrical engineering, electronic engineering, information engineering 0101 mathematics

DOI: 10.1016/j.physa.2021.126433 Publication Date: 2021-09-27T09:26:59Z

Abstract Supplemental Material References Cited by

AUTHORS (3)

Eric K. Tokuda

Cesar H. Comin

Luciano da F. Costa

ABSTRACT

An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. This paved the way to defining an objective means for identifying the clusters from dendrograms. The adopted model also allowed the relevance of the clusters to be quantified in terms of the height of their subtrees. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. The possibility of identifying the type of distribution was also investigated.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (39)

CITATIONS (55)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Revisiting agglomerative clustering

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....