De NovoPacBio long-read and phased avian genome assemblies correct and add to genes important in neuroscience research

ENCODE Sequence assembly
DOI: 10.1101/103911 Publication Date: 2017-01-29T06:10:13Z
ABSTRACT
Abstract Reference quality genomes are expected to provide a resource for studying gene structure and function. However, often genes of interest not completely or accurately assembled, leading unknown errors in analyses additional cloning efforts the correct sequences. A promising solution this problem is long-read sequencing. Here we tested PacBio-based sequencing diploid assembly potential improvements Sanger-based intermediate-read zebra finch reference Illumina-based short-read Anna’s hummingbird reference, two vocal learning avian species widely studied neuroscience genomics. With DNA same individuals used generate genomes, generated assemblies with FALCON-Unzip assembler, resulting contigs no gaps megabase range (N50s 5.4 7.7 Mb, respectively), representing 150-fold 200-fold over current references, respectively. These corrected resolved what discovered be misassemblies, including due erroneous sequences flanking gaps, complex repeat base call difficult sequence regions, inaccurate resolution allelic differences between haplotypes. We analyzed protein-coding specialized species, found numerous that completely, validated by single long genomic reads transcriptome reads. findings demonstrate, first time non-human impact higher quality, phased gap-less understanding
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (51)
CITATIONS (9)