Open-Source Sequence Clustering Methods Improve the State Of the Art
Benchmark (surveying)
DOI:
10.1128/msystems.00003-15
Publication Date:
2016-02-05T12:26:31Z
AUTHORS (10)
ABSTRACT
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated performance recently released state-of-the-art open-source software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST USEARCH) QIIME, hierarchical methods mothur, USEARCH's most recent algorithm, UPARSE. All latest tools showed promising results, reporting up 60% fewer spurious OTUs than UCLUST, indicating that underlying algorithm can vastly number these derived OTUs. Furthermore, observed stringent quality filtering, such as done UPARSE, cause significant underestimation species abundance diversity, leading incorrect biological results. SortMeRNA have been included QIIME 1.9.0 release. IMPORTANCE Massive collections next-generation data call for fast, accurate, easily accessible bioinformatics algorithms perform sequence clustering. A comprehensive benchmark presented, including popular USEARCH suite. Simulated, mock, environmental communities were used analyze sensitivity, selectivity, diversity (alpha beta), composition. The results demonstrate significantly improve accuracy preserve estimated without application aggressive filtering. Moreover, all open source, apply multiple levels multithreading, scale demands modern data, which essential massive multidisciplinary studies Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (43)
CITATIONS (144)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....