Giuseppe Insana
- Genomics and Phylogenetic Studies
- Machine Learning in Bioinformatics
- Advanced Proteomics Techniques and Applications
- Biomedical Text Mining and Ontologies
- Scientific Computing and Data Management
- Bioinformatics and Genomic Networks
- Research Data Management Practices
- Data Mining Algorithms and Applications
- Natural Language Processing Techniques
- Genomics and Rare Diseases
- RNA and protein synthesis mechanisms
- Renaissance Literature and Culture
- Classical Antiquity Studies
- Fractal and DNA sequence analysis
- Computational Drug Discovery Methods
- Language and cultural evolution
- Genetics, Bioinformatics, and Biomedical Research
- Semantic Web and Ontologies
- Enzyme Structure and Function
European Bioinformatics Institute
2019-2024
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this article, we describe significant updates that have made over last two years resource. number in UniProtKB has risen approximately 190 million, despite continued work reduce sequence redundancy at proteome level. We adopted new methods assessing completeness quality. continue extract detailed annotations from...
Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this publication we describe enhancements made our data processing pipeline website adapt an ever-increasing information content. number in UniProtKB has risen over 227 million are working towards including reference proteome for each taxonomic group. We continue extract detailed annotations from literature...
Abstract Motivation To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities Biological Interest), to better support efforts study and predict functionally interactions between protein sequences structures small molecule ligands. Results We structured data model cognate ligand site annotations performed a complete reannotation all stable unique identifiers from...
The aim of the UniProt Knowledgebase (UniProtKB; https://www.uniprot.org/) is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this publication, we describe ongoing changes our production pipeline limit available in UniProtKB high-quality, non-redundant reference proteomes. We continue manually curate scientific literature add latest data use machine learning techniques. also encourage community curation...
UniProt continues to support the ongoing process of making scientific data FAIR. Here we contribute this with a FAIRness assessment our UniProtKB dataset followed by critical reflection on challenges and future directions adoption validation FAIR principles metrics.
Abstract The ‘canonical’ protein sets distributed by UniProt are widely used for similarity searching, and functional structural annotation. For many investigators, canonical sequences the only version of a examined. However, higher eukaryotes often encode multiple isoforms from single gene. unreviewed (UniProtKB/TrEMBL) sequences, longest sequence in Gene-Centric group is chosen as canonical. This choice can create inconsistencies, selecting >95% identical orthologs with dramatically...
The "canonical" protein sets distributed by UniProt are widely used for similarity searching, and functional structural annotation. For many investigators, canonical sequences the only version of a examined. However, higher eukaryotes often encode multiple isoforms from single gene. unreviewed (UniProtKB/TrEMBL) sequences, longest sequence in Gene-Centric group is chosen as canonical. This choice can create inconsistencies, selecting >95% identical orthologs with dramatically different...
Abstract Motivation There now exist thousands of molecular biology databases covering every aspect biological data. This database infrastructure takes significant effort and funding to develop maintain. The creators these need make strong justifications funders prove their impact or importance. are many publication metrics tools available such as Google Scholar measure citation AltMetrics multiple measures including social media coverage. Results In this article, we describe a series novel...