BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data

Merge (version control)
DOI: 10.3389/fdata.2021.727216 Publication Date: 2022-01-18T11:31:53Z
ABSTRACT
Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs associated with human diseases and medical relevance. Accordingly, a variety computational methods have proposed to mine from genomes. Conventional rely on high-quality complete genome identify SSRs. However, the sequenced often misses several highly repetitive regions. Moreover, many non-model species no entire With recent advances next-generation sequencing (NGS) techniques, large-scale sequence reads for any can be rapidly generated using NGS. In this context, number thousands SSR loci within large amounts species. While most commonly used NGS platforms (e.g., Illumina platform) market generally provide paired-end reads, merging overlapping become common way prior identification loci. This posed big data analysis challenge traditional stand-alone tools merge read pairs data.In study, we present new Hadoop-based software program, termed BigFiRSt, address problem cutting-edge technology. BigFiRSt consists two major modules, BigFLASH BigPERF, implemented based state-of-the-art tools, FLASH PERF, respectively. BigPERF mining in manner, Comprehensive benchmarking experiments show dramatically reduce execution times fast very DNA data.The excellent performance mainly resorts Big Data Hadoop technology parallel distributed computing clusters. We anticipate will valuable tool coming biological era.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (117)
CITATIONS (3)