SPRING: a next-generation compressor for FASTQ data

Lossy compression
DOI: 10.1093/bioinformatics/bty1015 Publication Date: 2018-12-06T22:37:40Z
ABSTRACT
Abstract Motivation High-Throughput Sequencing technologies produce huge amounts of data in the form short genomic reads, associated quality values and read identifiers. Because significant structure present these FASTQ datasets, general-purpose compressors are unable to completely exploit much inherent redundancy. Although there has been a lot work on designing compressors, most them lack support one or more crucial properties, such as for variable length scalability high coverage pairing-preserving compression lossless compression. Results In this work, we propose SPRING, reference-free compressor files. SPRING supports wide variety modes features, including compression, lossy values, long random access. achieves substantially better than existing tools, example, compresses 195 GB 25× whole genome human from Illumina’s NovaSeq sequencer less 7 GB, around 1.6× smaller previous state-of-the-art compressors. improvement while using comparable computational resources. Availability implementation can be downloaded https://github.com/shubhamchandak94/SPRING. Supplementary information available at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (9)
CITATIONS (64)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....