- Bioinformatics and Genomic Networks
- Cancer Genomics and Diagnostics
- Data Mining Algorithms and Applications
- Gene expression and cancer classification
- Algorithms and Data Compression
- Data Management and Algorithms
- Rough Sets and Fuzzy Logic
- Genomics and Phylogenetic Studies
- Genomics and Rare Diseases
- Genetic factors in colorectal cancer
- Imbalanced Data Classification Techniques
- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Epigenetics and DNA Methylation
- Genetic Associations and Epidemiology
- RNA modifications and cancer
- Machine Learning and Algorithms
- Advanced Database Systems and Queries
- Genomics and Chromatin Dynamics
- Computational Drug Discovery Methods
- Data Quality and Management
- Data Stream Mining Techniques
- Genomic variations and chromosomal abnormalities
- Optimization and Search Problems
- Bayesian Modeling and Causal Inference
University of Padua
2015-2024
Brown University
2010-2019
University of Southern Denmark
2013-2019
National Center for Biotechnology Information
2019
Research Network (United States)
2017
Providence College
2011-2016
John Brown University
2010-2015
National Institutes of Health
2014
Walter and Eliza Hall Institute of Medical Research
2011
Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns and epigenetic phenotypes not yet clear. We analyzed genomes 200 clinically annotated adult cases de novo AML, using either whole-genome sequencing (50 cases) or whole-exome (150 cases), along with RNA microRNA DNA-methylation analysis. AML have fewer than most other cancers, an average only 13 found in genes. Of these, 5 genes recurrently mutated AML. A total...
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data analytical results for point mutations small insertions/deletions from 3,281 tumours 12 tumour types as part TCGA Pan-Cancer effort. We illustrate distributions mutation frequencies, contexts types, establish their links tissues origin, environmental/carcinogen influences, DNA repair defects. Using integrated sets, identified 127...
Next-generation DNA sequencing technologies are enabling genome-wide measurements of somatic mutations in large numbers cancer patients. A major challenge the interpretation these data is to distinguish functional “driver mutations” important for development from random “passenger mutations.” common approach identifying driver find genes that mutated at significant frequency a cohort genomes. This confounded by observation target multiple cellular signaling and regulatory pathways. Thus,...
Recent genome sequencing studies have shown that the somatic mutations drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional from sporadic, passenger mutations. Since hypothesized target relatively small cellular signaling and regulatory pathways, common practice is assess whether known pathways enriched for mutated We introduce an alternative approach examines genes in context genome-scale gene...
Cancer is a heterogeneous disease with different combinations of genetic alterations driving its development in individuals. We introduce CoMEt, an algorithm to identify that exhibit pattern mutual exclusivity across individuals, often observed for the same pathway. CoMEt includes exact statistical test and techniques perform simultaneous analysis multiple sets mutually exclusive subtype-specific alterations. demonstrate outperforms existing approaches on simulated real data. apply five...
Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, computational method integrates genomic and transcriptomic data to identify set patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims solve the “random walk facility location” (RWFL) problem gene (or protein) interaction network, which differs from standard location by...
Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...
Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in past few years, thanks large number applications enabled by modern ubiquitous computing devices. While several techniques hand-crafted feature engineering have been proposed, current state-of-the-art is represented deep learning architectures that automatically obtain high level representations and use recurrent neural...
Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for patients suffering from complex diseases, such cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts mitigate these drawbacks led development of network-based approaches that integrate pathway information produce meta-gene (MG) features. Also, MG only dealt with two-class problem good...
An uncertain graph 𝒢 = (V, E, p : E → (0, 1]) can be viewed as a probability space whose outcomes (referred to possible worlds ) are subgraphs of where any edge e ε occurs with ( ), independently the other edges. These graphs naturally arise in many application domains data management systems required cope uncertainty interrelated data, such computational biology, social network analysis, reliability, and privacy enforcement, among others. For this reason, it is important devise fundamental...
Motivated by applications that concern graphs are evolving and massive in nature, we define a new general framework for computing with such graphs. In our framework, the graph changes over time an algorithm can only track these explicitly probing graph. This captures inherent tradeoff between complexity of maintaining up-to-date view quality results computed available view. We apply this to two classical connectivity problems, namely, path minimum spanning trees, obtain efficient algorithms.
Recent cancer sequencing studies provide a wealth of somatic mutation data from large number patients. One the most intriguing and challenging questions arising this is to determine whether temporal order mutations in follows any common progression. Since we usually obtain only one sample patient, such inferences are commonly made cross-sectional different This analysis complicated by extensive variation across patients, that reduced examining combinations various pathways. Thus far, methods...
A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test widely used for this purpose, nearly all implementations of rely on an asymptotic approximation not appropriate many applications. This because: two populations determined by a variant may have very sizes; and evaluation possible demands highly accurate computation small p-values. We demonstrate problem cancer data where...
We present SPuManTE, an efficient algorithm for mining significant patterns from a transactional dataset. SPuManTE controls the Family-wise Error Rate: it ensures that probability of reporting one or more false discoveries is less than user-specified threshold. A key ingredient UT, our novel unconditional statistical test evaluating significance pattern, requires fewer assumptions on data generation process and appropriate knowledge discovery setting classical conditional tests, such as...
The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model the data, key data mining task with application in several areas. We present PROMISE, an algorithm for identifying sequential while guaranteeing that probability one or more false discoveries are reported output (i.e., Family-Wise Error Rate - FWER) less than user-defined threshold. PROMISE employs Westfall-Young method to correct multiple...
As advances in technology allow for the collection, storage, and analysis of vast amounts data, task screening assessing significance discovered patterns is becoming a major challenge data mining applications. In this work, we address context frequent itemset mining. Specifically, develop novel methodology to identify meaningful support threshold s * dataset, such that number itemsets with at least represents substantial deviation from what would be expected random dataset same transactions...
The extraction of patterns displaying significant association with a class label is key data mining task wide application in many domains. We study variant the problem that requires to mine top-k statistically patterns, thus providing tight control on number reported output. develop TopKWY, first algorithm while rigorously controlling family-wise error rate output and provide theoretical evidence its effectiveness. TopKWY crucially relies novel strategy explore several implementation...