Fec: a fast error correction method based on two-rounds overlapping and caching

0301 basic medicine 03 medical and health sciences Genome High-Throughput Nucleotide Sequencing Sequence Analysis, DNA Software Algorithms
DOI: 10.1093/bioinformatics/btac565 Publication Date: 2022-08-17T20:17:51Z
ABSTRACT
Abstract The third-generation sequencing technology has advanced genome analysis with long-read length, but the reads need error correction due to the high error rate. Error correction is a time-consuming process especially when the sequencing coverage is high. Generally, for a pair of overlapping reads A and B, the existing error correction methods perform a base-level alignment from B to A when correcting the read A. And another base-level alignment from A to B is performed when correcting the read B. However, based on our observation, the base-level alignment information can be reused. In this article, we present a fast error correction tool Fec, using two-rounds overlapping and caching. Fec can be used independently or as an error correction step in an assembly pipeline. In the first round, Fec uses a large window size (20) to quickly find enough overlaps to correct most of the reads. In the second round, a small window size (5) is used to find more overlaps for the reads with insufficient overlaps in the first round. When performing base-level alignment, Fec searches the cache first. If the alignment exists in the cache, Fec takes this alignment out and deduces the second alignment from it. Otherwise, Fec performs base-level alignment and stores the alignment in the cache. We test Fec on nine datasets, and the results show that Fec has 1.24–38.56 times speed-up compared to MECAT, CANU and MINICNS on five PacBio datasets and 1.16–27.8 times speed-up compared to NECAT and CANU on four nanopore datasets.Availability and implementationFec is available at https://github.com/zhangjuncsu/Fec.Supplementary informationSupplementary data are available at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (14)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....