Guy Cochrane
- Genomics and Phylogenetic Studies
- Microbial Community Ecology and Physiology
- Environmental DNA in Biodiversity Studies
- RNA and protein synthesis mechanisms
- Research Data Management Practices
- Scientific Computing and Data Management
- Bacteriophages and microbial interactions
- Species Distribution and Climate Change
- Genetics, Bioinformatics, and Biomedical Research
- Gene expression and cancer classification
- Protist diversity and phylogeny
- Biomedical Text Mining and Ontologies
- RNA modifications and cancer
- Cancer Genomics and Diagnostics
- CRISPR and Genetic Engineering
- Molecular Biology Techniques and Applications
- Rangeland and Wildlife Management
- Coral and Marine Ecosystems Studies
- Bioinformatics and Genomic Networks
- Marine and fisheries research
- Gut microbiota and health
- Invertebrate Taxonomy and Ecology
- SARS-CoV-2 and COVID-19 Research
- Cancer-related molecular mechanisms research
- Algorithms and Data Compression
European Bioinformatics Institute
2016-2025
Wellcome Trust
2009-2023
Bulgarian Academy of Sciences
2023
Institute of Biodiversity and Ecosystem Research
2023
Pensoft Publishers (Bulgaria)
2023
University of Tartu Natural History Museum and Botanical Garden
2023
Centre for Genomic Regulation
2022
SIB Swiss Institute of Bioinformatics
2022
University of Lausanne
2022
University of Newcastle Australia
2018
We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of Minimum Information about Any (x) Sequence (MIxS). The a Single Amplified Genome (MISAG) Metagenome-Assembled (MIMAG), including, but not limited to, assembly quality, estimates completeness contamination. These can be used in combination with other GSC checklists, including (MIGS), Metagenomic (MIMS), Marker Gene (MIMARKS). Community-wide...
Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded contaminant DNA. Whether introduced during sample processing or through co-extraction alongside DNA, if insufficient care is taken assembly process, final assembled genome a mixture several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, included downstream analyses users unaware...
The members of the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) set out to capture, preserve and present globally comprehensive public domain nucleotide sequence information. work long-standing collaboration includes provision data formats, annotation conventions routine global exchange. Among many developments INSDC resources in 2011 are newly launched BioProject database improved handling assembly In this article, we outline services update reader on 2011.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena ) is Europe's primary nucleotide-sequence repository. ENA consists of three main databases: the Sequence Read (SRA), Trace and EMBL-Bank. objective to support promote use nucleotide sequencing as an experimental research platform by providing data submission, archive, search download services. In this article, we outline these services describe major changes improvements introduced during 2010. These include extended EMBL-Bank...
The methodologies used to generate genome and metagenome annotations are diverse vary between groups laboratories. Descriptions of the annotation process helpful in interpreting data. Some have produced Standard Operating Procedures (SOPs) that describe process, but standards lacking for structure content these descriptions. In addition, there is no central repository store disseminate procedures protocols annotation. We highlight importance SOPs endorse an online SOPs.
This paper presents standards and best practices for reporting genome sequences of uncultivated viruses. We present an extension the Minimum Information about any (x) Sequence (MIxS) standard virus genomes. Uncultivated Virus Genome (MIUViG) were developed within Genomic Standards Consortium framework include origin, quality, annotation, taxonomic classification, biogeographic distribution in silico host prediction. Community-wide adoption MIUViG standards, which complement a Single...
MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over past 2 years, (formerly EBI Metagenomics) has more than doubled number publicly available analysed datasets held within resource. Recently, an updated approach been unveiled (version 5.0), replacing previous single pipeline with multiple pipelines tailored...
Transects of the submersible Alvin across rock outcrops in Oregon subduction zone have furnished information on structural and stratigraphic framework this accretionary complex. Communities clams tube worms, authigenic carbonate mineral precipitates, are associated with venting sites cool fluids located a fault-bend anticline at water depth 2036 meters. The distribution animals carbonates suggests up-dip migration from both shallow deep sources along permeable strata or fault zones within...
Data storage costs have become an appreciable proportion of total cost in the creation and analysis DNA sequence data. Of particular concern is that rate increase sequencing significantly outstripping disk capacity. In this paper we present a new reference-based compression method efficiently compresses sequences for storage. Our approach works resequencing experiments target well-studied genomes. We align to reference genome then encode differences between most efficient when allow...
The ocean is home to myriad small planktonic organisms that underpin the functioning of marine ecosystems. However, their spatial patterns diversity and underlying drivers remain poorly known, precluding projections responses global changes. Here we investigate latitudinal gradients predictors plankton across archaea, bacteria, eukaryotes, major virus clades using both molecular imaging data from Tara Oceans. We show a decline for most groups toward poles, mainly driven by decreasing...
Ocean microbial communities strongly influence the biogeochemistry, food webs, and climate of our planet. Despite recent advances in understanding their taxonomic genomic compositions, little is known about how transcriptomes vary globally. Here, we present a dataset 187 metatranscriptomes 370 metagenomes from 126 globally distributed sampling stations establish resource 47 million genes to study community-level across depth layers pole-to-pole. We examine gene expression changes community...
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences, collating information on ncRNA sequences all types from broad range organisms. We have recently added new genome mapping pipeline that identifies genomic locations for in 296 species. also several functional annotations, such as tRNA secondary structures, Gene Ontology and miRNA-target interactions. A quality control mechanism based Rfam family assignments potential contamination, incomplete more. The has become...
A vast and rich body of information has grown up as a result the world's enthusiasm for 'omics technologies. Finding ways to describe make available this that maximise its usefulness become major effort across world. At heart is Genomic Standards Consortium (GSC), an open-membership organization drives community-based standardization activities, Here we provide short history GSC, overview range current call scientific community join forces improve quality quantity contextual about our public...
RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides single entry point for accessing all types organisms. Since its launch in 2014, has integrated twelve new resources, taking the total number collaborating to 22, began importing data, such as modified nucleotides MODOMICS PDB. We created species-specific identifiers refer unique within context species. The website been subject continuous improvements focusing on...
Abstract The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data metadata >30 years. Three partner organizations, DNA Data Bank of Japan (DDBJ) at National Institute Genetics in Mishima, Japan; European Archive (ENA) Molecular Biology Laboratory's Bioinformatics (EMBL-EBI) Hinxton, UK; GenBank Center Biotechnology Information (NCBI), Library Medicine, Institutes...
EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from microbial populations found in particular environment. Over past two years, has increased number datasets analysed 10-fold. In addition throughput, underlying pipeline been overhauled include both new or updated tools reference databases. Of note is workflow taxonomic assignments that extended based on large small subunit RNA marker genes encompass...
Abstract The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. provides access to taxonomic assignments functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, metagenomic datasets, which are derived from wide range different environments. Over past 3 years, has not only grown in terms number datasets contained but also increased breadth provided,...
Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA insufficiently studied outside disease settings. In this study, we analyzed ≈28 terabases of Global Ocean sequences expand Earth's virus catalogs their taxonomy, investigate evolutionary origins, assess marine biogeography from pole pole. Using new approaches optimize discovery classification, identified that necessitate substantive revisions taxonomy (doubling phyla adding >50% classes)...
The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) comprises three global partners committed to capturing, preserving and providing comprehensive public-domain nucleotide sequence information. INSDC establishes standards, formats protocols for data metadata make it easier individuals organisations submit their reliably public archives. This work enables the continuous, exchange of information about living things. Here we present an update in 2015,...
For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access comprehensive public domain nucleotide sequence associated metadata which enables discovery in biomedicine, biodiversity biological sciences. Since 1987, DNA Data Bank of Japan (DDBJ) at National Institute for Genetics Mishima, Japan; European Archive (ENA) Molecular Biology Laboratory's Bioinformatics (EMBL-EBI)...
Abstract A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks recent advances in field genomics, extensive sequencing has been performed for a deep genomic analysis this huge samples. strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics metatranscriptomics, chosen size-fractionated...