- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Bacteriophages and microbial interactions
- Evolution and Genetic Dynamics
- CRISPR and Genetic Engineering
- Plant Virus Research Studies
- Microbial Community Ecology and Physiology
- Bacterial Genetics and Biotechnology
- Evolutionary Game Theory and Cooperation
- Protist diversity and phylogeny
- Insect symbiosis and bacterial influences
- Viral gastroenteritis research and epidemiology
- SARS-CoV-2 and COVID-19 Research
- Plant and Fungal Interactions Research
- RNA Research and Splicing
- Machine Learning in Bioinformatics
- Bioinformatics and Genomic Networks
- Genetic diversity and population structure
- Animal Virus Infections Studies
- Viral Infections and Vectors
- Protein Structure and Dynamics
- RNA modifications and cancer
- Origins and Evolution of Life
- COVID-19 epidemiological studies
- Microbial Metabolic Engineering and Bioproduction
National Institutes of Health
2016-2025
United States National Library of Medicine
2007-2025
National Center for Biotechnology Information
2015-2024
Government of the United States of America
2021-2022
Howard Hughes Medical Institute
2021
New York University
2021
San Sebastián University
2021
Rutgers, The State University of New Jersey
2020
Radboud Institute for Molecular Life Sciences
2019
Radboud University Medical Center
2019
The availability of multiple, essentially complete genome sequences prokaryotes and eukaryotes spurred both the demand opportunity for construction an evolutionary classification genes from these genomes. Such a system based on orthologous relationships between appears to be natural framework comparative genomics should facilitate functional annotation genomes large-scale studies.We describe here major update previously developed delineation Clusters Orthologous Groups proteins (COGs)...
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction which have been or will ever be studied experimentally. This leaves sequence analysis as the feasible way to annotate these proteins and assign them tentative functions. The Clusters Orthologous Groups (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has popular tool for functional annotation. Its success was largely based on (i) its reliance complete microbial...
Lactic acid-producing bacteria are associated with various plant and animal niches play a key role in the production of fermented foods beverages. We report nine genome sequences representing phylogenetic functional diversity these bacteria. The small genomes lactic acid encode broad repertoire transporters for efficient carbon nitrogen acquisition from nutritionally rich environments they inhabit reflect limited range biosynthetic capabilities that indicate both prototrophic auxotrophic...
We describe the draft genome of microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count a consequence an elevated rate duplication resulting in tandem clusters. More than third Daphnia's genes have no detectable homologs any other available proteome, most amplified families are specific to lineage. coexpansion interacting within metabolic pathways suggests that maintenance duplicated not random, analysis expression under different...
All archaeal and many bacterial genomes contain Clustered Regularly Interspaced Short Palindrome Repeats (CRISPR) variable arrays of the CRISPR-associated (cas) genes that have been previously implicated in a novel form DNA repair on basis comparative analysis their protein product sequences. However, proximity CRISPR cas strongly suggests they related functions which is hard to reconcile with hypothesis.The sequences numerous gene products were classified into approximately 25 distinct...
Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic and major events evolution making functional predictions for currently uncharacterized conserved genes.We examined evolutionary patterns recently constructed set 5,873 clusters predicted orthologs (eukaryotic orthologous groups or KOGs) from seven genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo...
ABSTRACT The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by shotgun approach. consists a 3.94-Mb chromosome and 192-kb megaplasmid that contains majority genes responsible for solvent production. Comparison C. to Bacillus subtilis reveals significant local conservation gene order, which not seen in comparisons other genomes with similar, or, some cases closer, phylogenetic proximity. This allows prediction many previously...
Abstract The Clusters of Orthologous Genes (COG) database, also referred to as the Groups proteins, was created in 1997 and went through several rounds updates, most recently, 2014. current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands scope database include complete genomes 1187 bacteria 122 archaea, typically, with a single genome per genus. In addition, version COGs includes following new features: (i) recently deprecated NCBI’s gene index (gi)...
The "knockout-rate" prediction holds that essential genes should be more evolutionarily conserved than are nonessential genes. This is because negative (purifying) selection acting on expected to stringent for genes, which functionally dispensable and/or redundant. However, a recent survey of evolutionary distances between Saccharomyces cerevisiae and Caenorhabditis elegans proteins did not reveal any difference the rates evolution An analysis mouse rat orthologous also found evolved at...
A computational procedure was developed for systematic detection of lineage-specific expansions (LSEs) protein families in sequenced genomes and applied to obtain a census LSEs five eukaryotic species, the yeasts Saccharomyces cerevisiae Schizosaccharomyces pombe , nematode Caenorhabditis elegans fruit fly Drosophila melanogaster green plant Arabidopsis thaliana . significant fraction proteins encoded each these genomes, up 80% A. belong LSEs. Many paralogous gene analyzed species are almost...
The majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances metagenomics led to discovery many new groups in a wide range hosts. These findings enable far more complete reconstruction evolution than was attainable previously. This reveals relationships between different Baltimore classes indicates extensive transfer distantly related hosts, such as plants animals. results call for major revision existing...
Prochlorococcus marinus , the dominant photosynthetic organism in ocean, is found two main ecological forms: high-light-adapted genotypes upper part of water column and low-light-adapted at bottom illuminated layer. P. SS120, complete genome sequence reported here, an extremely form. The SS120 composed a single circular chromosome 1,751,080 bp with average G+C content 36.4%. It contains 1,884 predicted protein-coding genes size 825 bp, rRNA operon, 40 tRNA genes. Together 1.66-Mbp MED4, one...
Bacteria and archaea are frequently attacked by viruses other mobile genetic elements rely on dedicated antiviral defense systems, such as restriction endonucleases CRISPR, to survive. The enormous diversity of suggests that more types systems exist than currently known. By systematic gene prediction heterologous reconstitution, here we discover 29 widespread cassettes, collectively present in 32% all sequenced bacterial archaeal genomes, mediate protection against specific bacteriophages....