- Gene expression and cancer classification
- Genomics and Phylogenetic Studies
- Biomedical Text Mining and Ontologies
- Bioinformatics and Genomic Networks
- SARS-CoV-2 and COVID-19 Research
- Machine Learning in Bioinformatics
- Cancer Genomics and Diagnostics
- vaccines and immunoinformatics approaches
- Scientific Computing and Data Management
- Genetic Associations and Epidemiology
- Single-cell and spatial transcriptomics
- Algorithms and Data Compression
- Bacteriophages and microbial interactions
- Epigenetics and DNA Methylation
- Semantic Web and Ontologies
- Genomics and Chromatin Dynamics
- RNA modifications and cancer
- Evolutionary Algorithms and Applications
- Respiratory Support and Mechanisms
- COVID-19 diagnosis using AI
- Machine Learning and Data Classification
- Genomics and Rare Diseases
- Cardiac Arrest and Resuscitation
- Data Mining Algorithms and Applications
- Liver Disease Diagnosis and Treatment
Politecnico di Milano
2013-2025
Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico
2022-2025
Stanford University
2023
University of Cyprus
2013
Chinese University of Hong Kong
2013
Applied Multilayers (United Kingdom)
2013
Importance Data on the association of COVID-19 vaccination with intensive care unit (ICU) admission and outcomes patients SARS-CoV-2–related pneumonia are scarce. Objective To evaluate whether is associated preventing ICU for to compare baseline characteristics vaccinated unvaccinated admitted an ICU. Design, Setting, Participants This retrospective cohort study regional data sets reports: (1) daily number administered vaccines (2) all consecutive in Lombardy, Italy, from August 1 December...
We previously proposed a paradigm shift in genomic data management, based on the Genomic Data Model (GDM) for mediating existing formats and GenoMetric Query Language (GMQL) supporting, at high level of abstraction, extraction most common data-driven computations required by tertiary analysis Next Generation Sequencing datasets. Here, we present new GMQL-based system with enhanced accessibility, portability, scalability performance.The has well-designed modular architecture featuring: (i) an...
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open available for secondary research, but their metadata search interfaces heterogeneous, not interoperable sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic system providing access to consolidated collection of attributes found in the most relevant datasets; values 10 semantically enriched making use suited ontologies. The...
ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK NMDC); it also exposes computed nucleotide amino acid variants, called original sequences. A GISAID-specific ViruSurf database, http://gmql.eu/virusurf_gisaid/, offers subset these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected four sources; but contains other virus species...
Abstract Variant visualization plays an important role in supporting the viral evolution analysis, extremely valuable during COVID-19 pandemic. VirusViz is a web-based application for comparing variants of selected populations and their sub-populations; it primarily focused on SARS-CoV-2 variants, although tool also supports other species (SARS-CoV, MERS-CoV, Dengue, Ebola). As input, imports results queries extracting metadata from large database ViruSurf, which integrates information about...
A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of and use computational methods impute the remainder. However, identifying best imputation what measures meaningfully evaluate performance are open questions. We address these questions by analyzing 23 from ENCODE Imputation Challenge. find that evaluations challenging confounded distributional shifts differences in data collection processing over time, amount available data,...
Abstract Background Mechanical power (MP) serves as a crucial predictive indicator for ventilator-induced lung injury and plays pivotal role in tailoring the management of mechanical ventilation. However, its application across different diseases stages remains nuanced. Methods Using AmsterdamUMCdb, we conducted retrospective study to analyze causal relationship between MP outcomes invasive ventilation, specifically SpO 2 /FiO ratio (P/F) ventilator-free days at day 28 (VFD28). We employed...
Thousands of new experimental datasets are becoming available every day; in many cases, they produced within the scope large cooperative efforts, involving a variety laboratories spread all over world, and typically open for public use. Although potential collective amount information is huge, effective combination such sources hindered by data heterogeneity, as exhibit wide notations formats, concerning both values metadata. Thus, integration fundamental activity, to be performed prior...
The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth public data repositories available to drive biological clinical research; combining information from various heterogeneous widely dispersed sources paramount number discoveries. difficult domain complex there no agreement among definitions, which refer different vocabularies ontologies. in bioinformatics community because, common practice, are...
Understanding complex biological phenomena involves answering biomedical questions on multiple biomolecular information simultaneously, which are expressed through genomic and proteomic semantic annotations scattered in many distributed heterogeneous data sources; such heterogeneity dispersion hamper the biologists' ability of asking global queries performing evaluations.To overcome this problem, we developed a software architecture to create maintain Genomic Proteomic Knowledge Base (GPKB),...
Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts omic data next years. In this scenario, we explore potential machine learning and, particularly, deep for breast cancer subtyping. Due to paucity publicly available data, leverage pan-cancer and non-cancer design semi-supervised settings. We make use multi-omic including microRNA...
Next Generation Sequencing technologies have produced a substantial increase of publicly available genomic data and related clinical/biospecimen information. New models methods to easily access, integrate search them effectively are needed. An effort was made by the Genomic Data Commons (GDC), which defined strict procedures for harmonizing clinical cancer, created GDC portal with its application programming interface (API). In this work, we enhance harmonization applying state art model...
EpiSurf is a Web application for selecting viral populations of interest and then analyzing how their amino acid changes are distributed along epitopes. Viral sequences searched within ViruSurf, which stores curated metadata imported from the most widely used deposition sources databases (GenBank, COVID-19 Genomics UK (COG-UK) Global initiative on sharing all influenza data (GISAID)). Epitopes open source Immune Epitope Database or directly proposed by users indicating start stop positions...
The ongoing evolution of SARS-CoV-2 and the rapid emergence variants concern at distinct geographic locations have relevant implications for implementation strategies controlling COVID-19 pandemic. Combining growing body data evidence on potential functional mutations can suggest highly effective methods prioritization novel concern, e.g. increasing in frequency locally and/or globally. However, these analyses may be complex, requiring integration different resources. We claim need a...
Abstract Background With the growth of available sequenced datasets, analysis heterogeneous processed data can answer increasingly relevant biological and clinical questions. Scientists are challenged in performing efficient reproducible extraction pipelines over heterogeneously datasets. Available software packages suitable for analyzing experimental files from such datasets one by one, but do not scale to thousands experiments. Moreover, they lack proper support metadata manipulation....
Biomedical questions are often complex and address multiple topics simultaneously. Answering them requires the comprehensive evaluation of several different types data. They available, but in distributed heterogeneous data sources; this hampers their global evaluation. We developed a software architecture to create maintain updated Genomic Proteomic Data Warehouse (GPDW), which integrates main such dispersed It uses modular multi-level schema based on abstraction generalization integrated...
With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number end-users, including biologists clinicians. Typical interactions require applying comparative data analysis huge repositories genomic information for building new knowledge, taking advantage latest findings in applied healthcare. Powerful technology extraction available, but broad use hampered by complexity accessing such methods tools. This work presents GeCoAgent, big-data service...