NFDI4DS | UHH-SEMS - Publication Details

Open-Source Sequence Clustering Methods Improve the State Of the Art

Benchmark (surveying)

DOI: 10.1128/msystems.00003-15 Publication Date: 2016-02-05T12:26:31Z

Abstract Supplemental Material References Cited by

AUTHORS (10)

Evguenia Kopylova

Jose A. Navas-Molina

Céline Mercier

Zhenjiang Zech Xu

Frédéric Mahé

Yan He

Hong-Wei Zhou

Torbjørn Rognes

J. Gregory Caporaso

Rob Knight

ABSTRACT

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated performance recently released state-of-the-art open-source software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST USEARCH) QIIME, hierarchical methods mothur, USEARCH's most recent algorithm, UPARSE. All latest tools showed promising results, reporting up 60% fewer spurious OTUs than UCLUST, indicating that underlying algorithm can vastly number these derived OTUs. Furthermore, observed stringent quality filtering, such as done UPARSE, cause significant underestimation species abundance diversity, leading incorrect biological results. SortMeRNA have been included QIIME 1.9.0 release. IMPORTANCE Massive collections next-generation data call for fast, accurate, easily accessible bioinformatics algorithms perform sequence clustering. A comprehensive benchmark presented, including popular USEARCH suite. Simulated, mock, environmental communities were used analyze sensitivity, selectivity, diversity (alpha beta), composition. The results demonstrate significantly improve accuracy preserve estimated without application aggressive filtering. Moreover, all open source, apply multiple levels multithreading, scale demands modern data, which essential massive multidisciplinary studies Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (43)

CITATIONS (144)

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Open-Source Sequence Clustering Methods Improve the State Of the Art

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....