- Advanced Proteomics Techniques and Applications
- Metabolomics and Mass Spectrometry Studies
- RNA and protein synthesis mechanisms
- Mass Spectrometry Techniques and Applications
- RNA modifications and cancer
- Molecular Biology Techniques and Applications
- Genomics and Phylogenetic Studies
- RNA Research and Splicing
- Bioinformatics and Genomic Networks
- Machine Learning in Bioinformatics
- Genetics, Bioinformatics, and Biomedical Research
- Malaria Research and Control
- Single-cell and spatial transcriptomics
- Gene expression and cancer classification
- Neutrophil, Myeloperoxidase and Oxidative Mechanisms
- Biotin and Related Studies
- vaccines and immunoinformatics approaches
- Signaling Pathways in Disease
- Vibrio bacteria research studies
- Atherosclerosis and Cardiovascular Diseases
- DNA and Nucleic Acid Chemistry
- Cellular transport and secretion
- Immune Response and Inflammation
- Immune cells in cancer
- RNA regulation and disease
European Bioinformatics Institute
2015-2025
Open Targets
2021-2024
Wellcome Trust
2015-2024
Institute of Bioinformatics and Applied Biotechnology
2011-2013
Abstract The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE one founding members global ProteomeXchange (PX) consortium and an ELIXIR core resource. In this manuscript, we summarize developments in resources related tools since previous update manuscript was published Nucleic Acids Research 2019. number submitted datasets to Archive (the archival component PRIDE) has reached...
The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein interest expressed. brings together data from >4500 expression studies >65 different species, across tissues. It makes these freely available in easy visualise form, after expert curation accurately represent intended experimental design,...
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of founding members ProteomeXchange consortium. This manuscript summarizes developments in PRIDE resources related tools for last three years. number submitted datasets to Archive (the archival component PRIDE) has reached on average around 534 per month. been possible thanks continuous improvements infrastructure such as a new...
Abstract Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are share some degree similarity, and sequence‐search algorithms use this principle to identify homologs. The requirement a fast sensitive method led development HMMER software, which in latest version (v3.1) uses combination sophisticated acceleration heuristics mathematical computational optimizations enable profile...
The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according outcome treatment response. Although a massive amount proteomics data deposited public repositories, this rich source information is vastly underused. Here, we attempted reuse datasets with two main objectives: i) generate hypotheses...
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). total identified 15,565 phosphosites on serine, threonine, tyrosine residues proteins. We sequence motifs phosphosites, link to enrichment of different processes, indicating downstream regulation likely caused by kinase groups....
The phagocyte respiratory burst is crucial for innate immunity. transfer of electrons to oxygen mediated by a membrane-bound heterodimer, comprising gp91phox and p22phox subunits. Deficiency either subunit leads severe immunodeficiency. We describe Eros (essential reactive species), protein encoded the previously undefined mouse gene bc017643, show that it essential host defense via NAPDH oxidase. required expression NADPH oxidase components, p22phox. Consequently, Eros-deficient mice...
Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related whether a given phosphosite has correctly identified estimate the global false localization rate (FLR) within data set for all reported. These generally calibrated using synthetic datasets, their statistical reliability real datasets is largely unknown, potentially leading studies reporting...
Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene protein expression localisation in bulk at single cell level. These resources aim to allow users investigate their normal tissue (baseline) or response perturbations such as disease changes genotype (differential) across multiple species. Users invited search genes metadata terms species biological conditions a standardised consistent interface....
Abstract The number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Dependent datasets, re-use DIA has been rather limited to date, despite its high potential, due technical challenges involved. We introduce a (re-)analysis pipeline for SWATH-MS which includes combination metadata annotation protocols, automated workflows MS data analysis,...
The PRIDE database is the largest public data repository of mass spectrometry-based proteomics and currently stores more than 40,000 sets covering a wide range organisms, experimental techniques, biological conditions. During past few years, has seen significant increase in amount submitted data-independent acquisition (DIA) sets. This provides an excellent opportunity for large-scale reanalysis reuse. We have reanalyzed 15 label-free DIA across various healthy human tissues to provide...
The availability of proteomics datasets in the public domain, and PRIDE database, particular, has increased dramatically recent years. This unprecedented large-scale data provides an opportunity for combined analyses to get organism-wide protein abundance a consistent manner. We have reanalyzed 24 from healthy human individuals assess baseline 31 organs. defined tissue as distinct functional or structural region within organ. Overall, aggregated dataset contains 67 tissues, corresponding...
The availability of an increasingly large amount public proteomics data sets presents opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is model organism relevant food production human biomedical research. Here, we reanalyzed 14 from the PRIDE database coming pig tissues assess baseline (without any perturbation) abundance in organs, encompassing total...
The increasingly large amount of proteomics data in the public domain enables, among other applications, combined analyses datasets to create comparative protein expression maps covering different organisms and biological conditions. Here we have reanalysed from mouse rat tissues (14 9 datasets, respectively), assess baseline abundance. Overall, aggregated dataset contained 23 individual including a total 211 samples coming 34 across 14 organs, comprising 3 strains, respectively. In all...
This article describes the creation of first expert manually curated noncoding RNA interaction networks for S. cerevisiae The RNA-RNA and RNA-protein have been carefully extracted from experimental literature made available through IntAct database (www.ebi.ac.uk/intact). We provide an initial network analysis compare their properties to much larger protein-protein network. find that proteins bind ncRNAs in contain only a small proportion classical binding domains. also see enrichment WD40...
Protein domains display a range of structural diversity, with numerous additions and deletions secondary elements between related domains. We have observed small number cases surprising large-scale core propose new concept called domain atrophy, where protein lose significant elements. Here, we implement pipeline to systematically identify atrophy across all known sequences. The output this was carefully checked by hand, which filtered out partial instances that were unlikely represent true...
Abstract The PRIDE database is the largest public repository of mass spectrometry-based proteomics data and currently stores more than 40,000 datasets covering a wide range organisms, experimental techniques biological conditions. During past few years, has seen an increase in amount submitted Data-Independent Acquisition (DIA) datasets, parallel with trends field. This provides excellent opportunity for large scale reanalysis reuse. We have systematically reanalysed 15 label-free DIA across...
A-to-I RNA editing is the most common non-transient epitranscriptome modification. It plays several roles in human physiology and has been linked to disorders. Large-scale deep transcriptome sequencing fostered characterization of at single nucleotide level development dedicated computational resources. REDIportal a unique specialized database collecting ∼16 million putative sites designed face current challenges epitranscriptomics. Its running version enriched with from TCGA project (using...
Vibrio cholerae, the enteropathogenic gram negative bacteria is one of main causative agents waterborne diseases like cholera. About 1/3rd organism's genome uncharacterised with many protein coding genes lacking structure and functional information. These proteins form significant fraction are crucial in understanding complete makeup. In this study we report general function a family hypothetical proteins, Domain Unknown Function 3233 (DUF3233), which conserved across gammaproteobacteria...
Summary Proteins and RNA functionally physically intersect in multiple biological processes, however, currently no universal method is available to purify protein-RNA complexes. Here we introduce XRNAX, a for the generic purification of protein-crosslinked RNA, demonstrate its versatility study composition dynamics interactions by various transcriptomic proteomic approaches. We show that XRNAX captures all biotypes, use this characterize sub-proteomes interact with coding non-coding RNAs...
ABSTRACT The number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Dependent datasets, re-use DIA has been rather limited to date, despite its high potential, due technical challenges involved. We introduce a (re-)analysis pipeline for SWATH-MS which includes combination metadata annotation protocols, automated workflows MS data analysis,...
Abstract The increasingly large amount of proteomics data in the public domain enables, among other applications, combined analyses datasets to create comparative protein expression maps covering different organisms and biological conditions. Here we have reanalysed from mouse rat tissues (14 9 datasets, respectively), assess baseline abundance. Overall, aggregated dataset contained 23 individual including a total 211 samples coming 34 across 14 organs, comprising 3 strains, respectively. In...