ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis

Memory footprint Alignment-free sequence analysis Biological data
DOI: 10.1109/tpds.2006.112 Publication Date: 2006-07-11T15:21:23Z
ABSTRACT
Genes in an organism's DNA (genome) have embedded them information about proteins, which are the molecules that do most of a cell's work. A typical bacterial genome contains on order 5,000 genes. Mammalian genomes can contain tens thousands For each sequenced, challenge is to identify protein components (proteome) being actively used for given set conditions. Fundamentally, sequence alignment matching problem focused unlocking genetic code, making it possible assemble "tree life" by comparing new sequences against all from known organisms. But, memory footprint data growing more rapidly than per-node core memory. Despite years research and development, high-performance applications either not scale well, cannot accommodate very large databases core, or require special hardware. We developed application, ScalaBLAST, accommodates scales linearly as many processors both distributed shared architectures, representing substantial improvement over current state-of-the-art with scaling portability. ScalaBLAST relies collection techniques—distributing target database available memory, multilevel parallelism exploit concurrency, parallel I/O, latency hiding through prefetching—to achieve scalability. This demonstrated approach sharing combined effective task scheduling should broad ranging other informatics-driven sciences.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (35)
CITATIONS (103)