Unifying the known and unknown microbial coding sequence space
0301 basic medicine
570
gene clusters
QH301-705.5
[SDV]Life Sciences [q-bio]
Science
[SDV.BBM]Life Sciences [q-bio]/Biochemistry
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Microbiology
Open Reading Frames
03 medical and health sciences
computational biology
Genome, Archaeal
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN]
616
[SDV.BBM] Life Sciences [q-bio]/Biochemistry, Molecular Biology
[SDV.BBM]Life Sciences [q-bio]/Biochemistry, Molecular Biology
functional metageomics
Biology (General)
Molecular Biology
Infectious disease
020
Molecular Biology/Genomics [q-bio.GN]
0303 health sciences
Bacteria
Q
R
microbial genomics
systems biology
phylogenomics
bioinformatics
[SDV] Life Sciences [q-bio]
[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN]
Medicine
Metagenome
unknown function
Computational and Systems Biology
DOI:
10.1101/2020.06.30.180448
Publication Date:
2020-07-01T13:59:41Z
AUTHORS (17)
ABSTRACT
AbstractGenes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we propose a conceptual framework and a computational workflow that bridge the known-unknown gap in genomes and metagenomes. We showcase our approach by exploring 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes. We quantify the extent of the unknown fraction, its diversity, and its relevance across multiple biomes. Furthermore, we provide a collection of 283,874 lineage-specific genes of unknown function forCand. Patescibacteria, being a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (114)
CITATIONS (11)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....