Michael L. Tress
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- RNA modifications and cancer
- Protein Structure and Dynamics
- RNA Research and Splicing
- Enzyme Structure and Function
- Machine Learning in Bioinformatics
- Bioinformatics and Genomic Networks
- Molecular Biology Techniques and Applications
- Advanced Proteomics Techniques and Applications
- Genomics and Chromatin Dynamics
- Computational Drug Discovery Methods
- Chromosomal and Genetic Variations
- Cancer-related molecular mechanisms research
- Genomics and Rare Diseases
- Protein Degradation and Inhibitors
- Genetics, Bioinformatics, and Biomedical Research
- Microbial Metabolic Engineering and Bioproduction
- Microbial Natural Products and Biosynthesis
- Genetics and Neurodevelopmental Disorders
- Genetic and phenotypic traits in livestock
- Monoclonal and Polyclonal Antibodies Research
- Biomedical Text Mining and Ontologies
- Machine Learning in Materials Science
- PI3K/AKT/mTOR signaling in cancer
Spanish National Cancer Research Centre
2015-2024
Centro de Investigación del Cáncer
2012-2016
Spanish National Centre for Cardiovascular Research
2015
Dana-Farber Cancer Institute
2012
Cancer Research Center
2012
Biocom
2009
University of Manchester
2007
European Bioinformatics Institute
2007
Instituto de Salud Carlos III
2007
Consejo Superior de Investigaciones Científicas
2004-2006
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since first public release this annotation data set, few new protein-coding loci have been added, yet number alternative splicing transcripts annotated has steadily increased. 7 contains 20,687 9640 long noncoding RNA 33,977 coding not represented UCSC genes RefSeq. It also most comprehensive (lncRNA) publicly available...
The accurate identification and description of the genes in human mouse genomes is a fundamental requirement for high quality analysis data informing both genome biology clinical genomics. Over last 15 years, GENCODE consortium has been producing reference gene annotations to provide this foundational resource. includes experimental computational groups who work together improve extend annotation. Specifically, we generate primary data, create bioinformatics tools support expert manual...
Abstract The GENCODE project annotates human and mouse genes transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology clinical genomics. annotation processes make use of primary bioinformatic tools analysis generated both within the consortium externally to support creation transcript structures determination their function. Here, we present improvements our infrastructure, bioinformatics tools, analysis, advances they in...
Determining the full complement of protein-coding genes is a key goal genome annotation. The most powerful approach for confirming potential detection cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% GENCODE annotation human genome. We found strong relationship between experiments and both gene family age cross-species conservation. Most which were highly conserved. >96%...
High-throughput sequencing of full-length transcripts using long reads has paved the way for discovery thousands novel transcripts, even in well-annotated mammalian species. The advances technology have created a need studies and tools that can characterize these variants. Here, we present SQANTI, an automated pipeline classification long-read assess quality data preprocessing 47 unique descriptors. We apply SQANTI to neuronal mouse transcriptome Pacific Biosciences (PacBio) illustrate how...
Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Results Here, we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for...
GENCODE produces high quality gene and transcript annotation for the human mouse genomes. All is supported by experimental data serves as a reference genome biology clinical genomics. The consortium generates targeted data, develops bioinformatic tools carries out analyses that, along with externally produced methods, support identification of structures determination their function. Here, we present an update on genes, including developments in tools, major collaborations which underpin...
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. has been designed to provide value manual the genome by adding reliable protein structural and functional data information from cross-species conservation. The visual representation provided for each gene allows annotators researchers alike easily identify changes brought about splicing events. In addition collecting, integrating analyzing predictions effect events, also...
The classic organization of a gene structure has followed the Jacob and Monod bacterial model proposed more than 50 years ago. Since then, empirical determinations complexity transcriptomes found in yeast to human blurred definition physical boundaries genes. Using multiple analysis approaches we have characterized individual mapping on chromosomes 21 22. Analyses locations 5′ 3′ transcriptional termini 492 protein coding genes revealed that for 85% these extend beyond current annotated...
Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns genomic erosion that might limit viability, offer tools for effective conservation. The Iberian lynx (Lynx pardinus) is the most felid a unique example on brink extinction.
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential alter biological function of expressed and even create new functions. has been suggested as explanation for discrepancy between number human functional complexity. Here, we carry out a detailed study alternatively spliced products annotated in ENCODE pilot project. We find alternative is frequent commonly suggested,...
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction protein networks. In recent years a significant number methods have been developed to predict these interface residues here we review current status field. Progress in this area requires clear view methodology applied, data sets used training testing systems, evaluation procedures. We analysed impact representative set features algorithms highlighted problems...
The APPRIS database (http://appris-tools.org) uses protein structural and functional features information from cross-species conservation to annotate splice isoforms in protein-coding genes. selects a single isoform, the 'principal' as reference for each gene based on these annotations. A main isoform reflects biological reality most coding genes principal are best predictors of proteins isoforms. Here, we present updates database, new developments that include addition three species...
Chimeric RNAs comprise exons from two or more different genes and have the potential to encode novel proteins that alter cellular phenotypes. To date, numerous putative chimeric transcripts been identified among ESTs isolated several organisms using high throughput RNA sequencing. The few corresponding protein products characterized mostly result chromosomal translocations are associated with cancer. Here, we systematically establish some of genuinely expressed in human cells. Using...
Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to transcripts simultaneously across cells, or produce dominant isoforms in manner that either tissue-specific regardless tissue. To date, large-scale investigations into the pattern transcript expression distinct tissues have produced contradictory results. Here, we attempt determine splice variant at protein level. We interrogate peptides from eight human proteomics...
Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool genome annotation projects. Peptides detected experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% genes annotated by GENCODE consortium for human as part a comprehensive analysis experimental spectra from two large publicly available databases. We protein "novel" "putative"...
Alternative splicing of messenger RNA can generate a wide variety mature transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression alternative same not true alternatively spliced products. Large-scale mass spectroscopy experiments have identified at level, but conflicting results. Here we carried out rigorous analysis peptide from eight large-scale proteomics to assess scale that detectable by...
The role of alternative splicing is one the great unanswered questions in cellular biology. There strong evidence for at transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested evolved order to remodel tissue-specific protein-protein networks. Here we investigated among isoforms detected a large-scale proteomics analysis. Although data supporting limited protein clear patterns emerged small numbers could detect data. More than...
Seventeen years after the sequencing of human genome, proteome is still under revision. One in eight 22 210 coding genes listed by Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across three sets. We have carried out an in-depth investigation on 2764 classified as one or more sets manual curators not others. Data from large-scale genetic variation analyses suggests that most protein-like purifying selection so unlikely to code for functional proteins. A...
APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms range of species. selects principal based on structure and function features cross-species conservation. Most coding genes produce single main isoform the chosen by best represent this cellular isoform. Human genetic data, experimental evidence distribution clinical variants all support relevance isoforms. have now been expanded to 10 model organisms. In paper we highlight most...
Germline loss-of-function variants in CTNNB1 cause neurodevelopmental disorder with spastic diplegia and visual defects (NEDSDV; OMIM 615075) are the most frequent, recurrent monogenic of cerebral palsy (CP). We investigated range clinical phenotypes owing to disruptions determine association between NEDSDV CP.Genetic information from 404 individuals collectively 392 pathogenic were ascertained for study. From these, detailed 52 previously unpublished collected combined 68 published...
GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies methodologies allow us to catalog genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables identify large numbers of missing transcripts substantially improve existing models, our long non-coding RNA catalogs have undergone a dramatic expansion reconfiguration result. Meanwhile, we are...