Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes
Proteogenomics
Proteome
Gene prediction
Sequence assembly
Gene Annotation
DOI:
10.1101/gr.201368.115
Publication Date:
2016-11-16T13:17:12Z
AUTHORS (42)
ABSTRACT
Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly annotation of newly sequenced genomes. Here, we provide a proof-of-concept an integrated approach for analysis the Anopheles stephensi, which is one most important vectors malaria parasite. To achieve broad coverage genes, carried out sequencing profiling multiple anatomically distinct sites. Based on transcriptomic alone, identified corrected 535 events incomplete involving 1196 scaffolds 868 protein-coding gene models. This proteogenomic enabled us to add 365 genes that were missed during identify 917 correction through discovery 151 novel exons, 297 protein extensions, 231 exon 192 start sites, 19 translational frames, 28 joining 76 adjacent as single gene. Incorporation proteomic evidence allowed change designation than 87 predicted "noncoding RNAs" conventional mRNAs coded by genes. Importantly, extension assemblies models 15 other assembled Anopheline genomes led large number apparent discrepancies in these Our framework how future efforts should incorporate combination simultaneous manual curation near complete
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (43)
CITATIONS (59)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....