Jonathan M. Mudge
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- RNA modifications and cancer
- Genomics and Chromatin Dynamics
- Cancer-related molecular mechanisms research
- Machine Learning in Bioinformatics
- Molecular Biology Techniques and Applications
- RNA Research and Splicing
- Genomics and Rare Diseases
- Chromosomal and Genetic Variations
- Advanced Proteomics Techniques and Applications
- Genomic variations and chromosomal abnormalities
- Genetics, Bioinformatics, and Biomedical Research
- CRISPR and Genetic Engineering
- Gene expression and cancer classification
- Bioinformatics and Genomic Networks
- Cancer Genomics and Diagnostics
- Metabolomics and Mass Spectrometry Studies
- Genetics and Neurodevelopmental Disorders
- Genetic Mapping and Diversity in Plants and Animals
- Biomedical Text Mining and Ontologies
- Genetic and phenotypic traits in livestock
- Epilepsy research and treatment
- Mitochondrial Function and Pathology
- Circular RNAs in diseases
European Bioinformatics Institute
2017-2025
Blackstone (United States)
2023
Wellcome Sanger Institute
2006-2019
Wellcome Trust
2019
University of California, Santa Cruz
2013
University Hospital of Geneva
2012
Newcastle University
2003-2004
Centre for Life
2004
Health Sciences and Nutrition
1996
The accurate identification and description of the genes in human mouse genomes is a fundamental requirement for high quality analysis data informing both genome biology clinical genomics. Over last 15 years, GENCODE consortium has been producing reference gene annotations to provide this foundational resource. includes experimental computational groups who work together improve extend annotation. Specifically, we generate primary data, create bioinformatics tools support expert manual...
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of draft human genome, with aim accelerating genomics research through rapid open distribution public data. Large amounts raw data are thus transformed into knowledge, which is made available via a multitude channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded multiple directions. First, resources describe fields genomics, gene...
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed efficiently deliver annotation at scale all eukaryotic life, it also provides deep comprehensive key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the new assemblies. Here, report release greatest annual number newly annotated genomes history...
Abstract The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed comprehensive annotation of gene structures, regulatory elements variants, enable comparative genomics by inferring the evolutionary history genes genomes. Our integrated are made available in a variety ways, including genome browsers, search interfaces, specialist tools such as Variant Effect Predictor, download files programmatic interfaces....
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation comparative genomics across the vertebrate subphylum key model organisms. pipeline capable of integrating experimental reference data from multiple providers into single integrated resource. Here, we present 94 newly annotated re-annotated genomes, bringing total number genomes offered by to 227. This represents largest expansion resource since its...
Abstract The GENCODE project annotates human and mouse genes transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology clinical genomics. annotation processes make use of primary bioinformatic tools analysis generated both within the consortium externally to support creation transcript structures determination their function. Here, we present improvements our infrastructure, bioinformatics tools, analysis, advances they in...
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. seeks be a fundamental resource driving progress by creating, maintaining and updating reference genome annotation comparative genomics resources. This year we describe our new expanded gene, variant capabilities, which led 50% increase in number of vertebrate genomes support. We have also doubled human variants added regulatory regions for many mouse...
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms more than twenty years. During that time, our resources, services tools have continually evolved in line with both the publicly available genome data downstream research applications utilise platform. In recent years we witnessed a dramatic shift landscape. There been large increase number of reference genomes through global biodiversity initiatives. parallel, there major...
Effective use of the human and mouse genomes requires reliable identification genes their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation genes, transcripts, proteins. The collaborative consensus coding sequence (CCDS) project tracks protein annotations on reference with a stable identifier (CCDS ID), ensures they consistently represented NCBI, Ensembl, UCSC Genome Browsers. Importantly,...
GENCODE produces high quality gene and transcript annotation for the human mouse genomes. All is supported by experimental data serves as a reference genome biology clinical genomics. The consortium generates targeted data, develops bioinformatic tools carries out analyses that, along with externally produced methods, support identification of structures determination their function. Here, we present an update on genes, including developments in tools, major collaborations which underpin...
Abstract Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, absence a standard for clinical reporting and browser display complicates process consistent interpretation reporting. To address these challenges, Ensembl/GENCODE 1 RefSeq 2 launched joint initiative, Matched Annotation from NCBI EMBL-EBI (MANE) collaboration, converge on human gene transcript jointly define high-value set transcripts corresponding proteins. Here, we...
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides single access point to 44 resources and >18 million ncRNA from wide range organisms types. now also includes secondary (2D) structure information for >13 sequences, making the world's largest 2D database. The diagrams are displayed using R2DT, new visualization method uses consistent, reproducible recognizable layouts related RNAs. sequence similarity search has been updated with faster interface...
Abstract Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates model organisms more than two decades. In recent years, there been dramatic shift in the landscape, with large increase number phylogenetic breadth of reference genomes, alongside major advances pan-genome representations higher species. order to support these efforts accelerate downstream research, continues focus on scaling rapid...
The classic organization of a gene structure has followed the Jacob and Monod bacterial model proposed more than 50 years ago. Since then, empirical determinations complexity transcriptomes found in yeast to human blurred definition physical boundaries genes. Using multiple analysis approaches we have characterized individual mapping on chromosomes 21 22. Analyses locations 5′ 3′ transcriptional termini 492 protein coding genes revealed that for 85% these extend beyond current annotated...
Annotation on the reference genome of C57BL6/J mouse has been an ongoing project ever since draft was first published. Initially, principle focus identification all protein-coding genes, although today importance describing long non-coding RNAs, small and pseudogenes is recognized. Here, we describe progress GENCODE annotation project, which combines manual from HAVANA group with Ensembl computational annotation, alongside experimental in silico validation pipelines other members consortium....
Abstract The acceleration of DNA sequencing in samples from patients and population studies has resulted extensive catalogues human genetic variation, but the interpretation rare variants remains problematic. A notable example this challenge is existence disruptive dosage-sensitive disease genes, even apparently healthy individuals. Here, by manual curation putative loss-of-function (pLoF) haploinsufficient genes Genome Aggregation Database (gnomAD) 1 , we show that one explanation for...
All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials evolutionary adaptation. We analyzed the origins of 7,264 recently cataloged human sORFs found most were evolutionarily young had emerged de novo. additionally identified 221 previously missed potentially translated into peptides up to 15 amino acids—all which are smaller than smallest microprotein annotated date. To investigate bioactivity sORF-encoded...
Abstract The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, consortium generated over 427 million sequences from complementary DNA direct RNA datasets, encompassing human, mouse manatee species. Developers utilized these data address challenges in transcript isoform detection, quantification de novo detection. study revealed that...
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on single haploid reference genome. Here, we present the EN-TEx resource 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating catalog >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes less conserved...
Ribosome profiling (Ribo-Seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands noncanonical sites ribosome translation outside currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7000 ORFs are translated, which, first glance, potential to expand number protein CDSs 30%, from ∼19,500 over 26,000 CDSs. Yet, additional scrutiny these raised numerous questions about what fraction them truly produce...
<h3>Abstract</h3> <b>Objective</b>: To examine the association between environmental exposure to lead and children9s intelligence at age 11-13 years, assess implications of in first seven years life for later childhood development. <b>Design</b>: Prospective cohort study. <b>Subjects</b>: 375 children born or around smelting town Port Pirie, Australia, 1979 1982. <b>Main outcome measure</b>: Children9s quotient (IQ) measured age. <b>Results</b>: IQ was inversely associated with both...
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by National Center for Biotechnology Information (NCBI) Ensembl annotation pipelines. Identical annotations pass quality assurance tests tracked with stable identifier (CCDS ID). Members collaboration, who from NCBI, Wellcome Trust Sanger Institute University...
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in annotations produced independently by NCBI Ensembl group at EMBL-EBI. This is product an international collaboration includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics University California, Santa Cruz. Identically coding regions, which generated using automated pipeline pass multiple...