NFDI4DS | UHH-SEMS - Publication Details

Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark

0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology

DOI: 10.1016/j.procs.2017.03.138 Publication Date: 2017-04-08T05:15:35Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

Rui Liu

Xiaoge Li

Liping Du

Shuting Zhi

Mian Wei

ABSTRACT

Clustering algorithm is widely used in data mining. It attempt to classify elements into several clusters, and the elements in the same cluster are more similar to each other meanwhile the elements belonging to other clusters are not similar. The recently published density peaks clustering algorithm can overcome the disadvantage of the distance-based algorithm that can only find clusters of nearly-circular shapes, instead it can discover clusters of arbitrary shapes and it is insensitive to noise data. However it needs calculate distances between all pairs of data points and is not scalable to the big data, in order to reduce the computational cost of the algorithm we propose an efficient distributed density peaks clustering algorithm based on Spark's GraphX. This paper proves the effectiveness of the method based on two different data set. The experimental results show our system can improve the performance significantly (up to 10x) comparing to MapReduce implementation. We also evaluate our system expansibility and scalability.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (9)

CITATIONS (12)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....