NFDI4DS | UHH-SEMS - Publication Details

Accelerating high-dimensional clustering with lossless data reduction

Fungal Proteins Proteomics 0301 basic medicine 03 medical and health sciences Gene Expression Regulation, Fungal Yeasts Cluster Analysis Computational Biology Humans DNA Methylation Algorithms Software

DOI: 10.1093/bioinformatics/btx328 Publication Date: 2017-05-16T19:11:09Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Bahjat F Qaqish

Jonathon J O’Brien

Jonathan C Hibbard

Katie J Clowers

ABSTRACT

Abstract Motivation For cluster analysis, high-dimensional data are associated with instability, decreased classification accuracy and high-computational burden. The latter challenge can be eliminated as a serious concern. For applications where dimension reduction techniques are not implemented, we propose a temporary transformation which accelerates computations with no loss of information. The algorithm can be applied for any statistical procedure depending only on Euclidean distances and can be implemented sequentially to enable analyses of data that would otherwise exceed memory limitations. Results The method is easily implemented in common statistical software as a standard pre-processing step. The benefit of our algorithm grows with the dimensionality of the problem and the complexity of the analysis. Consequently, our simple algorithm not only decreases the computation time for routine analyses, it opens the door to performing calculations that may have otherwise been too burdensome to attempt. Availability and implementation R, Matlab and SAS/IML code for implementing lossless data reduction is freely available in the Appendix.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (30)

CITATIONS (3)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Accelerating high-dimensional clustering with lossless data reduction

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....