Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering
Dendrogram
Linkage (software)
DOI:
10.48550/arxiv.2404.19019
Publication Date:
2024-04-29
AUTHORS (4)
ABSTRACT
Computing a Single-Linkage Dendrogram (SLD) is key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, SLD of $T$ binary dendrogram that summarizes $n-1$ clusterings obtained by contracting edges order weight. Existing algorithms for computing all require $\Omega(n\log n)$ work where $n = |T|$. Furthermore, to best our knowledge no prior provides parallel algorithm obtaining non-trivial speedup this problem. In paper, we design faster SLDs both theory and practice based on new structural results about SLDs. particular, obtain deterministic output-sensitive contraction requires $O(n \log h)$ $O(\log^2 n \log^2 depth, $h$ height output SLD. We also give bottom-up problem inspired nearest neighbor chain agglomerative clustering, show it achieves $O(n\log $O(h depth. Our are novel divide-and-conquer framework building SLDs, Cartesian trees. can quickly compute billion-scale trees, up 150x over highly-efficient Union-Find typically used practice.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....