- Gene expression and cancer classification
- Bioinformatics and Genomic Networks
- Genomics and Phylogenetic Studies
- SARS-CoV-2 and COVID-19 Research
- Biomedical Text Mining and Ontologies
- Cancer Genomics and Diagnostics
- Computational Drug Discovery Methods
- Algorithms and Data Compression
- vaccines and immunoinformatics approaches
- Genomics and Chromatin Dynamics
- Machine Learning in Bioinformatics
- Scientific Computing and Data Management
- COVID-19 Clinical Research Studies
- Fiber-reinforced polymer composites
- PARP inhibition in cancer therapy
- Single-cell and spatial transcriptomics
- Epigenetics and DNA Methylation
- Semantic Web and Ontologies
- Genetics, Bioinformatics, and Biomedical Research
- Evolutionary Algorithms and Applications
- Additive Manufacturing and 3D Printing Technologies
- Bacteriophages and microbial interactions
- Data Mining Algorithms and Applications
- RNA modifications and cancer
- Mechanical Behavior of Composites
Politecnico di Milano
2016-2025
Stanford University
2023
Center for Genomic Science
2015
Italian Institute of Technology
2015
University of Cyprus
2013
Chinese University of Hong Kong
2013
Applied Multilayers (United Kingdom)
2013
Lockheed Martin (United States)
1968-1982
Science Research Laboratory
1969-1975
Abstract Motivation: Improvement of sequencing technologies and data processing pipelines is rapidly providing data, with associated high-level features, many individual genomes in multiple biological clinical conditions. They allow for data-driven genomic, transcriptomic epigenomic characterizations, but require state-of-the-art ‘big data’ computing strategies, abstraction levels beyond available tool capabilities. Results: We propose a high-level, declarative GenoMetric Query Language...
Within the GEN-COVID Multicenter Study, biospecimens from more than 1000 SARS-CoV-2 positive individuals have thus far been collected in Biobank (GCB). Sample types include whole blood, plasma, serum, leukocytes, and DNA. The GCB links samples to detailed clinical data available Patient Registry (GCPR). It includes hospitalized patients (74.25%), broken down into intubated, treated by CPAP-biPAP, with O
We previously proposed a paradigm shift in genomic data management, based on the Genomic Data Model (GDM) for mediating existing formats and GenoMetric Query Language (GMQL) supporting, at high level of abstraction, extraction most common data-driven computations required by tertiary analysis Next Generation Sequencing datasets. Here, we present new GMQL-based system with enhanced accessibility, portability, scalability performance.The has well-designed modular architecture featuring: (i) an...
ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK NMDC); it also exposes computed nucleotide amino acid variants, called original sequences. A GISAID-specific ViruSurf database, http://gmql.eu/virusurf_gisaid/, offers subset these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected four sources; but contains other virus species...
Classical drug design methodologies are hugely costly and time-consuming, with approximately 85% of the new proposed molecules failing in first three phases FDA approval process. Thus, strategies to find alternative indications for already approved drugs that leverage computational methods crucial relevance. We previously demonstrated efficacy Non-negative Matrix Tri-Factorization, a method allows exploiting both data integration machine learning, infer novel drugs. In this work, we present...
Background SARS-CoV-2 viremia has been found to be a potential prognostic factor in patients hospitalized for COVID-19. Objective We aimed assess the association between and mortality COVID-19 during different epidemic periods. Methods A prospective registry was queried extract all with an available performed at hospital admission March 2020 January 2022. assessed by means of GeneFinderTM Plus RealAmp Kit assay ELITe MGB ® using <45 cycle threshold define positivity. Uni multivariable...
Abstract Variant visualization plays an important role in supporting the viral evolution analysis, extremely valuable during COVID-19 pandemic. VirusViz is a web-based application for comparing variants of selected populations and their sub-populations; it primarily focused on SARS-CoV-2 variants, although tool also supports other species (SARS-CoV, MERS-CoV, Dengue, Ebola). As input, imports results queries extracting metadata from large database ViruSurf, which integrates information about...
A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of and use computational methods impute the remainder. However, identifying best imputation what measures meaningfully evaluate performance are open questions. We address these questions by analyzing 23 from ENCODE Imputation Challenge. find that evaluations challenging confounded distributional shifts differences in data collection processing over time, amount available data,...
Consistency and completeness of biomolecular annotations is a keypoint correct interpretation biological experiments. Yet, the associations between genes (or proteins) features correctly annotated are just some all existing ones. As time goes by, they increase in number become more useful, but remain incomplete them incorrect. To support quicken their time-consuming curation procedure to improve consistence available annotations, computational methods that able supply ranked list predicted...
Gene function annotations, which are associations between a gene and term of controlled vocabulary describing functional features, paramount importance in modern biology. Datasets these such as the ones provided by Ontology Consortium, used to design novel biological experiments interpret their results. Despite importance, sources information have some known issues. They incomplete, since knowledge is far from being definitive it rapidly evolves, erroneous annotations may be present. Since...
Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene feature term that describes functional of by using controlled vocabulary (e.g. Ontology (GO) term). Unfortunately, available contain errors biologically validated ones incomplete definition, since new knowledge continuously discovered. Thus, computational algorithms which able to provide ranked lists predicted an excellent contribution bioinformatics research....
Next Generation Sequencing (NGS) is a family of technologies for reading the DNA or RNA, capable producing whole genome sequences at an impressive speed, and causing revolution both biological research medical practice. In this exciting scenario, while huge number specialized bio-informatics programs extract information from sequences, there increasing need new generation systems frameworks integrating such information, providing holistic answers to needs biologists clinicians. To respond...
Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts omic data next years. In this scenario, we explore potential machine learning and, particularly, deep for breast cancer subtyping. Due to paucity publicly available data, leverage pan-cancer and non-cancer design semi-supervised settings. We make use multi-omic including microRNA...
Genomic annotations with functional controlled terms, such as the Gene Ontology (GO) ones, are paramount in modern biology. Yet, they known to be incomplete, since current biological knowledge is far definitive. In this scenario, computational methods that able support and quicken curation of these can very useful. a previous work, we discussed benefits using Probabilistic Latent Semantic Analysis algorithm order predict novel GO annotations, compared some Singular Value Decomposition (SVD)...
Genomic annotations describing functional features of genes and proteins through controlled terminologies ontologies are extremely valuable, especially for computational analyses aimed at inferring new biomedical knowledge. Thanks to the biology revolution led by introduction novel DNA sequencing technologies, several repositories such have becoming available in last decade; among them, ones including Gene Ontology most relevant. Nevertheless, set genomic is incomplete, only some represent...
Most scientific databases consist of datasets (or sources) which in turn include samples files) with an identical structure schema). In many cases, are associated rich metadata, describing the process that leads to building them (e.g.: experimental conditions used during sample generation). Metadata typically computations just for initial data selection; at most, metadata about query results is recovered after executing query, and its by post-processing. this way, a large body information...
The ongoing evolution of SARS-CoV-2 and the rapid emergence variants concern at distinct geographic locations have relevant implications for implementation strategies controlling COVID-19 pandemic. Combining growing body data evidence on potential functional mutations can suggest highly effective methods prioritization novel concern, e.g. increasing in frequency locally and/or globally. However, these analyses may be complex, requiring integration different resources. We claim need a...
Abstract Statins, widely used cardiovascular drugs that lower cholesterol by inhibiting HMG-CoA reductase, have been increasingly recognized for their potential anticancer properties. This study elucidates the underlying mechanism, revealing statins exploit Synthetic Lethality, a principle where co-occurrence of two non-lethal events leads to cell death. Our computational analysis approximately 37,000 SL pairs identified as targeting genes involved in with metastatic genes. In vitro...
We are developing a new, holistic data management system for genomics, which uses cloud-based computing querying thousands of heterogeneous genomic datasets. In our project, it is essential to leverage upon modern cloud framework, so as encode query expressions into high-level operations provided by the framework. After releasing first implementation using Pig and Hadoop 1, we currently targeting Spark Flink, two emerging frameworks general-purpose big analytics. While appears have stronger...
EpiSurf is a Web application for selecting viral populations of interest and then analyzing how their amino acid changes are distributed along epitopes. Viral sequences searched within ViruSurf, which stores curated metadata imported from the most widely used deposition sources databases (GenBank, COVID-19 Genomics UK (COG-UK) Global initiative on sharing all influenza data (GISAID)). Epitopes open source Immune Epitope Database or directly proposed by users indicating start stop positions...