NFDI4DS | UHH-SEMS - Publication Details

Giuseppe Insana

ORCID: 0000-0002-8186-1026

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5033654825

Research Areas

Genomics and Phylogenetic Studies
Machine Learning in Bioinformatics
Advanced Proteomics Techniques and Applications
Biomedical Text Mining and Ontologies
Scientific Computing and Data Management
Bioinformatics and Genomic Networks
Research Data Management Practices
Data Mining Algorithms and Applications
Natural Language Processing Techniques
Genomics and Rare Diseases
RNA and protein synthesis mechanisms
Renaissance Literature and Culture
Classical Antiquity Studies
Fractal and DNA sequence analysis
Computational Drug Discovery Methods
Language and cultural evolution
Genetics, Bioinformatics, and Biomedical Research
Semantic Web and Ontologies
Enzyme Structure and Function

European Bioinformatics Institute
2019-2024

UniProt: the universal protein knowledgebase in 2021

OPENALEX - Publications

Alex Bateman María Martin Sandra Orchard Michele Magrane Rahat Agivetova and 95 more

The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this article, we describe significant updates that have made over last two years resource. number in UniProtKB has risen approximately 190 million, despite continued work reduce sequence redundancy at proteome level. We adopted new methods assessing completeness quality. continue extract detailed annotations from...

10.1093/nar/gkaa1100 article EN cc-by Nucleic Acids Research 2020-11-02

UniProt: the Universal Protein Knowledgebase in 2023

OPENALEX - Publications

Alex Bateman María Martin Sandra Orchard Michele Magrane Shadab Ahmad and 95 more

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this publication we describe enhancements made our data processing pipeline website adapt an ever-increasing information content. number in UniProtKB has risen over 227 million are working towards including reference proteome for each taxonomic group. We continue extract detailed annotations from literature...

10.1093/nar/gkac1052 article EN cc-by Nucleic Acids Research 2022-11-21

Annotation of biologically relevant ligands in UniProtKB using ChEBI

OPENALEX - Publications

Elisabeth Coudert Sébastien Géhant Edouard de Castro Monica Pozzato Delphine Baratin and 95 more

Abstract Motivation To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities Biological Interest), to better support efforts study and predict functionally interactions between protein sequences structures small molecule ligands. Results We structured data model cognate ligand site annotations performed a complete reannotation all stable unique identifiers from...

10.1093/bioinformatics/btac793 article EN cc-by Bioinformatics 2022-12-08

UniProt: the Universal Protein Knowledgebase in 2025

OPENALEX - Publications

Alex Bateman María Martin Sandra Orchard Michele Magrane Aduragbemi S. Adesina and 94 more

The aim of the UniProt Knowledgebase (UniProtKB; https://www.uniprot.org/) is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this publication, we describe ongoing changes our production pipeline limit available in UniProtKB high-quality, non-redundant reference proteomes. We continue manually curate scientific literature add latest data use machine learning techniques. also encourage community curation...

10.1093/nar/gkae1010 article EN cc-by Nucleic Acids Research 2024-11-18

FAIR adoption, assessment and challenges at UniProt

OPENALEX - Publications

Leyla García Jerven Bolleman Sébastien Géhant Nicole Redaschi María Martin and 95 more

UniProt continues to support the ongoing process of making scientific data FAIR. Here we contribute this with a FAIRness assessment our UniProtKB dataset followed by critical reflection on challenges and future directions adoption validation FAIR principles metrics.

10.1038/s41597-019-0180-9 article EN cc-by Scientific Data 2019-09-20

Improved selection of canonical proteins for reference proteomes

OPENALEX - Publications

Giuseppe Insana María Martin William R. Pearson

Abstract The ‘canonical’ protein sets distributed by UniProt are widely used for similarity searching, and functional structural annotation. For many investigators, canonical sequences the only version of a examined. However, higher eukaryotes often encode multiple isoforms from single gene. unreviewed (UniProtKB/TrEMBL) sequences, longest sequence in Gene-Centric group is chosen as canonical. This choice can create inconsistencies, selecting &gt;95% identical orthologs with dramatically...

10.1093/nargab/lqae066 article EN cc-by NAR Genomics and Bioinformatics 2024-04-04

Improved selection of canonical proteins for reference proteomes

OPENALEX - Publications

Giuseppe Insana María Martin William R. Pearson

The "canonical" protein sets distributed by UniProt are widely used for similarity searching, and functional structural annotation. For many investigators, canonical sequences the only version of a examined. However, higher eukaryotes often encode multiple isoforms from single gene. unreviewed (UniProtKB/TrEMBL) sequences, longest sequence in Gene-Centric group is chosen as canonical. This choice can create inconsistencies, selecting >95% identical orthologs with dramatically different...

10.1101/2024.03.04.583387 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-03-06

MBDBMetrics: An online metrics tool to measure the impact of biological data resources

OPENALEX - Publications

Giuseppe Insana Alexandr Ignatchenko María Martin Alex Bateman Alex Bateman and 92 more

Abstract Motivation There now exist thousands of molecular biology databases covering every aspect biological data. This database infrastructure takes significant effort and funding to develop maintain. The creators these need make strong justifications funders prove their impact or importance. are many publication metrics tools available such as Google Scholar measure citation AltMetrics multiple measures including social media coverage. Results In this article, we describe a series novel...

10.1093/bioadv/vbad180 article EN cc-by Bioinformatics Advances 2023-01-01

Coming Soon ...