- Machine Learning in Bioinformatics
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Protein Structure and Dynamics
- Genomics and Rare Diseases
- Bioinformatics and Genomic Networks
- Microbial Metabolic Engineering and Bioproduction
- Enzyme Structure and Function
- Identification and Quantification in Food
- Genetics, Bioinformatics, and Biomedical Research
- Genomic variations and chromosomal abnormalities
- Genetic diversity and population structure
- Biomedical Text Mining and Ontologies
- Computational Drug Discovery Methods
- CRISPR and Genetic Engineering
- Cancer Genomics and Diagnostics
- Advanced Proteomics Techniques and Applications
- Genetic factors in colorectal cancer
- Evolution and Genetic Dynamics
- Genetics and Neurodevelopmental Disorders
- Glycosylation and Glycoproteins Research
- Insect and Arachnid Ecology and Behavior
- Metabolism and Genetic Disorders
- Plant and animal studies
- Lipid Membrane Structure and Behavior
University of Bologna
2016-2025
Biocom
2015
Zambon (Italy)
2013
Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Results Here, we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for...
Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. combines methods identifying signal and transit peptides (DeepSig TPpred3), GPI-anchors (PredGPI) transmembrane domains (ENSEMBLE3.0 BetAware) with discriminating localization of both globular membrane proteins (BaCelLo, MemLoci SChloro). Outcomes from the are processed integrated annotating eukaryotic bacterial sequences....
Abstract Motivation: Protein function depends on its structural stability. The effects of single point variations protein stability can elucidate the molecular mechanisms human diseases and help in developing new drugs. Recently, we introduced INPS, a method suited to predict effect from sequence whose performance is competitive with available state-of-the-art tools. Results: In this article, describe INPS-MD (Impact Non synonymous Stability-Multi-Dimension), web server for prediction...
Abstract Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition Assessment Genome Interpretation (CAGI) challenge, a dataset 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified primary non-small cell lung cancer biopsies, was experimentally assayed to characterize methods from four participating teams five publicly available...
Abstract Motivation: A tool for reliably predicting the impact of variations on protein stability is extremely important both engineering and understanding effects Mendelian somatic mutations in genome. Next Generation Sequencing studies are constantly increasing number sequences. Given huge disproportion between sequences structures, there a need tools suited to annotate effect starting from sequence without relying structure. Here, we describe INPS, novel approach annotating non-synonymous...
The identification of signal peptides in protein sequences is an important step toward localization and function characterization.Here, we present DeepSig, improved approach for peptide detection cleavage-site prediction based on deep learning methods. Comparative benchmarks performed updated independent dataset proteins show that DeepSig the current best performing method, scoring better than other available state-of-the-art approaches both precise identification.DeepSig as standalone...
Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA computed from protein structures with different algorithms, sequences machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure residues can be associated pathogenicity variation. By this, wild-type residue acquires role in context functional annotation single-residue variations (SRVs). mapping curated database human...
The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial are physiologically active different and aberrant contributes to the pathogenesis human pathologies. Many computational methods exist assign protein sequences subcellular such as nucleus, cytoplasm organelles. However, substantial lack experimental evidence public sequence databases hampered so far finer grain discrimination, including also intra-organelle compartments.We...
Abstract Proteins are “social molecules.” Recent experimental evidence supports the notion that large protein aggregates, known as biomolecular condensates, affect structurally and functionally many biological processes. Condensate formation may be permanent and/or time dependent, suggesting processes can occur locally, depending on cell needs. The question then arises to which extent we monitor protein‐aggregate formation, both experimentally theoretically predict/simulate functional...
Abstract Background Antimicrobial resistance has been identified as a major threat to global health. The pig food chain is considered an important source of antimicrobial genes (ARGs). However, there still lack knowledge on the dispersion ARGs in production system, including external environment. Results In present study, we longitudinally followed one swine farm located Italy from weaning phase slaughterhouse comprehensively assess diversity ARGs, their diffusion, and bacteria associated...
The knowledge of protein–protein interaction sites (PPIs) is crucial for protein functional annotation. Here we address the problem focusing on prediction putative PPIs considering as input sequences. issue important given huge volume sequences compared to experimental and/or computed structures. Taking advantage language models, recently developed, and Deep Neural networks, here describe ISPRED-SEQ, which overpasses state-of-the-art predictors addressing same problem. ISPRED-SEQ freely...
We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins Reference Human Proteome, when available, their with 2,900 directly associated PDB structures at least structure to sequence coverage 70%. Statistics indicate that good quality tend overlap TM-score >0.6 as long some structural information is available. As expected, model superimposition highlights are slightly superior ones. However, 55% endowed...
Abstract Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity missense variants is necessary evaluate their clinical research utility suggest directions for future improvement. Here, as part sixth edition Critical Assessment Genome Interpretation (CAGI) challenge, we assess variant effect predictors (or impact predictors) on an evaluation dataset rare from disease-relevant databases. Our evaluates submitted CAGI6 Annotate-All-Missense...
Abstract Background A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing genome-wide. To aid the interpretation prioritization of vast number detected, computational methods proliferating. Knowing which tools most effective remains unclear. evaluate performance methods, to encourage innovation method development, we designed Critical...
Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in compilation lists genes related to diseases and show that an increasing number is associated with multiple genes. Investigating functional relations among same disease contributes highlighting molecular mechanisms pathogenesis. We present eDGAR, a database collecting organizing data on gene/disease associations as derived from OMIM,...
Abstract Motivation: Molecular recognition of N-terminal targeting peptides is the most common mechanism controlling import nuclear-encoded proteins into mitochondria and chloroplasts. When experimental information lacking, computational methods can annotate peptides, determine their cleavage sites for characterizing protein localization, function, mature sequences. The problem discriminating mitochondrial from chloroplastic propeptides particularly relevant when annotating proteomes...
The advent of massive DNA sequencing technologies is producing a huge number human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones one the crucial challenges precision medicine. Computational tools based on artificial intelligence provide models for sequence encoding, bypassing database searches evolutionary information. We leverage new encoding schemes an efficient...
Abstract Reliably scoring and ranking candidate models of protein complexes assigning their oligomeric state from the structure crystal lattice represent outstanding challenges. A community‐wide effort was launched to tackle these The latest resources on interfaces were exploited derive a benchmark dataset consisting 1677 homodimer structures, including balanced mix physiological non‐physiological complexes. in selected bury similar or larger interface area than counterparts, making it more...
Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include precise identification boundaries, annotation typical heptad repeat pattern along coiled-coil helices as well oligomerization state.In this article, we describe CoCoNat, a novel method predicting helix residue-level register annotation, state....
Abstract Motivation The knowledge of protein stability upon residue variation is an important step for functional design and understanding how variants can promote disease onset. Computational methods are to complement experimental approaches allow a fast screening large datasets variations. Results In this work we present DDGemb, novel method combining language model embeddings transformer architectures predict ΔΔG both single- multi-point DDGemb has been trained on high-quality dataset...
This study reports draft genomes of 30 bacteria representative the plant food system microbiota and isolated from different sources in Italy France. Individual were reconstructed using PacBIO DNA sequencing: taxonomic classification distribution genes involved microbe-environment interactions are reported to facilitate strains' characterization utilization.
AlphaFold2 predicts protein structures from structural and functional knowledge. Alternatively, ESMFold does the same adopting language models. Here, we map available Pfam domains on pairs of models human reference proteome computed with both procedures compare mapped regions relevant for annotation. We find that, rather irrespectively global superimposition pairwise models, Pfam-containing overlap a TM-score above 0.8 predicted local distance difference test (pLDDT) which is higher than...