xGAP: a python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery

Python Extensibility
DOI: 10.1093/bioinformatics/btaa1097 Publication Date: 2021-01-05T04:21:19Z
ABSTRACT
Since the first human genome was sequenced in 2001, there has been a rapid growth number of bioinformatic methods to process and analyze next-generation sequencing (NGS) data for research clinical studies that aim identify genetic variants influencing diseases traits. To achieve this goal, one needs call from NGS data, which requires multiple computationally intensive analysis steps. Unfortunately, is lack an open-source pipeline can perform all these steps on manner, fully automated, efficient, rapid, scalable, modular, user-friendly fault tolerant. address this, we introduce xGAP, extensible Genome Analysis Pipeline, implements modified GATK best practice DNA-seq with aforementioned functionalities.xGAP massive parallelization by splitting into many smaller regions efficient load-balancing high scalability. It 30× coverage whole-genome (WGS) ∼90 min. In terms accuracy discovered variants, xGAP achieves average F1 scores 99.37% single nucleotide 99.20% insertion/deletions across seven benchmark WGS datasets. We highly consistent results on-premises (SGE & SLURM) high-performance clusters. Compared Churchill pipeline, similar parallelization, 20% faster when analyzing 50× Amazon Web Service. Finally, tolerant where it automatically re-initiate failed processes minimize required user intervention.xGAP available at https://github.com/Adigorla/xgap.Supplementary are Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (35)
CITATIONS (0)