Nuno A. Fonseca
- Cancer Genomics and Diagnostics
- Genomics and Phylogenetic Studies
- RNA modifications and cancer
- RNA and protein synthesis mechanisms
- Data Mining Algorithms and Applications
- Genomics and Chromatin Dynamics
- Environmental DNA in Biodiversity Studies
- Molecular Biology Techniques and Applications
- Bioinformatics and Genomic Networks
- Chromosomal and Genetic Variations
- Gene expression and cancer classification
- Logic, Reasoning, and Knowledge
- RNA Research and Splicing
- Identification and Quantification in Food
- Semantic Web and Ontologies
- Rough Sets and Fuzzy Logic
- Plant Reproductive Biology
- Plant Molecular Biology Research
- Plant and animal studies
- Evolution and Genetic Dynamics
- Epigenetics and DNA Methylation
- Advanced Database Systems and Queries
- Cancer-related molecular mechanisms research
- Microbial Community Ecology and Physiology
- Data Management and Algorithms
Centro de Investigação em Biodiversidade e Recursos Genéticos
2023-2024
European Bioinformatics Institute
2013-2023
Universidade do Porto
2009-2023
Novametrics (United Kingdom)
2021
University of Trás-os-Montes and Alto Douro
2021
University of Lisbon
2020-2021
Image Metrics (United Kingdom)
2021
Calouste Gulbenkian Foundation
2020
Center for Neurosciences
2019
University of Coimbra
2019
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly such into high-quality, finished sequences remains challenging. Many tools are available, but they differ greatly in terms their performance (speed, scalability, hardware requirements, acceptance newer read technologies) final output (composition assembled sequence). More importantly, it largely unclear how best assess the quality sequences. Assemblathon competitions...
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety technologies assaying modalities genome, such as gene expression or promoter occupancy. The number experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions have overtaken microarray 12 months. Additionally, there significant increase investigating single cells, rather than bulk samples, known single-cell...
The discovery of drivers cancer has traditionally focused on protein-coding genes
Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal plant samples of different cell types, organism parts, developmental stages, diseases other conditions. It consists selected microarray RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality processed using standardised analysis methods. Since the last update, has grown seven-fold (1572 as August 2015),...
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe Assemblathon 1 competition, which aimed to comprehensively assess state art in methods when applied current technologies. In collaborative effort, teams were asked assemble simulated Illumina HiSeq data set an unknown, diploid A total 41 assemblies from 17 different groups received. Novel haplotype aware...
Abstract Expression Atlas is EMBL-EBI’s resource for gene and protein expression. It sources compiles data on the abundance localisation of RNA proteins in various biological systems contexts provides open access to this research community. With increased availability single cell RNA-Seq datasets public archives, we have now extended with a new added-value service display expression cells. Single Cell was launched 2018 currently includes 123 studies from 12 species. The website can be...
Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species contexts, such as tissue, developmental stage, disease or cell type. The available public controlled access data sets from sources are curated re-analysed using standardized, open source pipelines made for queries, download visualization. As of August 2017, holds 3,126 studies across 33 species, including 731 plants. Data large-scale RNA...
Abstract Transcript alterations often result from somatic changes in cancer genomes 1 . Various forms of RNA have been described cancer, including overexpression 2 , altered splicing 3 and gene fusions 4 ; however, it is difficult to attribute these underlying genomic owing heterogeneity among patients tumour types, the relatively small cohorts for whom samples analysed by both transcriptome whole-genome sequencing. Here we present, our knowledge, most comprehensive catalogue...
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases other biological experimental conditions. The consists of selected high-quality microarray RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms processed using standardized analysis methods. new version...
Abstract Tumors subvert immune cell function to evade responses, yet the complex mechanisms driving evasion remain poorly understood. Here we show that tumors induce de novo steroidogenesis in T lymphocytes anti-tumor immunity. Using a transgenic steroidogenesis-reporter mouse line identify and characterize steroidogenic cells, defining global gene expression identity of these steroid-producing cells regulatory networks by using single-cell transcriptomics. Genetic ablation restricts primary...
Abstract Motivation: A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining mappers that are most suitable for specific application not trivial. Results: This survey focuses on classifying through wide number characteristics. The goal allow practitioners compare more easily find those their problem. Availability: regularly...
Transcriptional dysregulation induced by aberrant transcription factors (TF) is a key feature of cancer, but its global influence on drug sensitivity has not been examined. Here, we infer the transcriptional activity 127 TFs through analysis RNA-seq gene expression data newly generated for 448 cancer cell lines, combined with publicly available datasets to survey total 1,056 lines and 9,250 primary tumors. Predicted TF activities are supported their agreement independent shRNA essentiality...
Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous paralogous classification, whole-genome alignments, synteny. Additional annotations include ontology-based protein structure function; genetic, epigenetic, phenotypic diversity; pathway associations....
Abstract Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for resource lncRNAs with validated roles. Furthermore, it remains debated whether mutated can drive tumorigenesis, and such functions could be conserved during evolution. Here, as part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, we introduce Cancer LncRNA Census (CLC), compilation 122 GENCODE causal roles in phenotypes. In contrast to existing databases, CLC requires...
Multi-omics datasets represent distinct aspects of the central dogma molecular biology. Such high-dimensional profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple using statistical fusion, rationalizes contributing evidence highlights associated genes. As part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing from...
This paper presents a new meta-heuristic (EPSO) built putting together the best features of evolution strategies (ES) and particle swarm optimization (PSO). Examples superiority EPSO over classical PSO are reported. The also describes application to real world problems, including an in opto-electronics another power systems.
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) pathways (The Plant Reactome archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The section features 39 fully assembled reference that integrated using ontology-based annotation analyses, accessed through both visual programmatic...
A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation a wide range tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction loss-of-function (LoF) when RefSeq Ensembl transcripts used for annotation, highlighting importance reference on which variant based. We describe detailed analysis similarities between gene...
Abstract The catalog of cancer driver mutations in protein-coding genes has greatly expanded the past decade. However, non-coding are less well-characterized and only a handful recurrent mutations, most notably TERT promoter have been reported. Here, as part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 across 38 tumor types, we perform multi-faceted pathway network analyses 2583 genomes 27 types compiled by PCAWG...
Traditional detection of aquatic invasive species via morphological identification is often time-consuming and can require a high level taxonomic expertise, leading to delayed mitigation responses. Environmental DNA (eDNA) approaches multiple using Illumina-based sequencing technology have been used overcome these hindrances, but sample processing lengthy. More recently, portable nanopore has become available, which the potential make molecular more widely accessible substantially decrease...
The genetic code is an abstraction of how mRNA codons and tRNA anticodons molecularly interact during protein synthesis; the stability regulation this interaction remains largely unexplored. Here, we characterized expression genes quantitatively at multiple time points in two developing mouse tissues. We discovered that codon pools are highly stable over development simply reflect genomic background; contrast, precise gene families required to create corresponding transcriptomes. dynamic...
Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning short reads generated genome or transcriptome before pre-defined sets genes. Differences in alignment/quantification tools can have major effect upon found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect inferred from RNA-seq data? And, how close are...