Fabio Vandin

ORCID: 0000-0003-2244-2320
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Bioinformatics and Genomic Networks
  • Cancer Genomics and Diagnostics
  • Data Mining Algorithms and Applications
  • Gene expression and cancer classification
  • Algorithms and Data Compression
  • Data Management and Algorithms
  • Rough Sets and Fuzzy Logic
  • Genomics and Phylogenetic Studies
  • Genomics and Rare Diseases
  • Genetic factors in colorectal cancer
  • Imbalanced Data Classification Techniques
  • Complex Network Analysis Techniques
  • Advanced Graph Neural Networks
  • Epigenetics and DNA Methylation
  • Genetic Associations and Epidemiology
  • RNA modifications and cancer
  • Machine Learning and Algorithms
  • Advanced Database Systems and Queries
  • Genomics and Chromatin Dynamics
  • Computational Drug Discovery Methods
  • Data Quality and Management
  • Data Stream Mining Techniques
  • Genomic variations and chromosomal abnormalities
  • Optimization and Search Problems
  • Bayesian Modeling and Causal Inference

University of Padua
2015-2024

Brown University
2010-2019

University of Southern Denmark
2013-2019

National Center for Biotechnology Information
2019

Research Network (United States)
2017

Providence College
2011-2016

John Brown University
2010-2015

National Institutes of Health
2014

Walter and Eliza Hall Institute of Medical Research
2011

10.1038/nature10166 article EN Nature 2011-06-28
T J Ley Christopher A. Miller Li Ding Benjamin J. Raphael Andrew J. Mungall and 95 more Gordon L. Robertson Katherine A. Hoadley Timothy J. Triche Peter W. Laird Jack Baty Lucinda Fulton Robert S. Fulton Sharon E. Heath Joelle Kalicki-Veizer Cyriac Kandoth Jeffery M. Klco Daniel C. Koboldt Krishna Kanchi Shashikant Kulkarni Tamara Lamprecht David E. Larson Ge Lin Charles Lu Michael D. McLellan Joshua F. McMichael Jacqueline E. Payton Heather K. Schmidt David H. Spencer Michael H. Tomasson John W. Wallis Lukas D. Wartman Mark A. Watson John S. Welch Michael C. Wendl Adrian Ally Miruna Balasundaram İnanç Birol Yaron S.N. Butterfield Readman Chiu Andy Chu Eric Chuah Hye Jung E. Chun Richard Corbett Noreen Dhalla Ranabir Guin Anyuan He Carrie Hirst Martin Hirst Robert A. Holt Steven J.M. Jones Aly Karsan Darlene Lee Haiyan I. Li Marco A. Marra Michael Mayo Richard A. Moore Karen Mungall Jeremy Parker Erin Pleasance Patrick Plettner Jacquie Schein Dominik Stoll Lucas Swanson Angela Tam Nina Thiessen Richard Varhol Natasja Wye Yongjun Zhao Stacey Gabriel Gad Getz Carrie Sougnez Lihua Zou Mark D.M. Leiserson Fabio Vandin Hsin Ta Wu Frederick R. Applebaum Stephen B. Baylin Rehan Akbani Bradley M. Broom Ken Chen Thomas Motter Khanh Cong Nguyen John N. Weinstein Nianziang Zhang Martin L. Ferguson Christopher M. Adams Aaron Black Jay Bowen Julie M. Gastier‐Foster Thomas W. Grossman Tara M. Lichtenberg Lisa Wise Tanja M. Davidsen John A. Demchok Kenna Shaw Margi Sheth Heidi J. Sofia Liming Yang James R. Downing Greg Eley

Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns and epigenetic phenotypes not yet clear. We analyzed genomes 200 clinically annotated adult cases de novo AML, using either whole-genome sequencing (50 cases) or whole-exome (150 cases), along with RNA microRNA DNA-methylation analysis. AML have fewer than most other cancers, an average only 13 found in genes. Of these, 5 genes recurrently mutated AML. A total...

10.1056/nejmoa1301689 article EN New England Journal of Medicine 2013-05-02

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data analytical results for point mutations small insertions/deletions from 3,281 tumours 12 tumour types as part TCGA Pan-Cancer effort. We illustrate distributions mutation frequencies, contexts types, establish their links tissues origin, environmental/carcinogen influences, DNA repair defects. Using integrated sets, identified 127...

10.1038/nature12634 article EN cc-by-nc-sa Nature 2013-10-15

Next-generation DNA sequencing technologies are enabling genome-wide measurements of somatic mutations in large numbers cancer patients. A major challenge the interpretation these data is to distinguish functional “driver mutations” important for development from random “passenger mutations.” common approach identifying driver find genes that mutated at significant frequency a cohort genomes. This confounded by observation target multiple cellular signaling and regulatory pathways. Thus,...

10.1101/gr.120477.111 article EN cc-by-nc Genome Research 2011-06-07

Recent genome sequencing studies have shown that the somatic mutations drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional from sporadic, passenger mutations. Since hypothesized target relatively small cellular signaling and regulatory pathways, common practice is assess whether known pathways enriched for mutated We introduce an alternative approach examines genes in context genome-scale gene...

10.1089/cmb.2010.0265 article EN Journal of Computational Biology 2011-03-01

Cancer is a heterogeneous disease with different combinations of genetic alterations driving its development in individuals. We introduce CoMEt, an algorithm to identify that exhibit pattern mutual exclusivity across individuals, often observed for the same pathway. CoMEt includes exact statistical test and techniques perform simultaneous analysis multiple sets mutually exclusive subtype-specific alterations. demonstrate outperforms existing approaches on simulated real data. apply five...

10.1186/s13059-015-0700-7 article EN cc-by Genome biology 2015-08-07

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, computational method integrates genomic and transcriptomic data to identify set patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims solve the “random walk facility location” (RWFL) problem gene (or protein) interaction network, which differs from standard location by...

10.1101/gr.221218.117 article EN cc-by-nc Genome Research 2017-07-18

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...

10.1101/043430 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-03-12

Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in past few years, thanks large number applications enabled by modern ubiquitous computing devices. While several techniques hand-crafted feature engineering have been proposed, current state-of-the-art is represented deep learning architectures that automatically obtain high level representations and use recurrent neural...

10.1109/jsen.2021.3067690 article EN IEEE Sensors Journal 2021-03-22

Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for patients suffering from complex diseases, such cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts mitigate these drawbacks led development of network-based approaches that integrate pathway information produce meta-gene (MG) features. Also, MG only dealt with two-class problem good...

10.1093/nar/gkx642 article EN cc-by-nc Nucleic Acids Research 2017-07-13

An uncertain graph 𝒢 = (V, E, p : E → (0, 1]) can be viewed as a probability space whose outcomes (referred to possible worlds ) are subgraphs of where any edge e ε occurs with ( ), independently the other edges. These graphs naturally arise in many application domains data management systems required cope uncertainty interrelated data, such computational biology, social network analysis, reliability, and privacy enforcement, among others. For this reason, it is important devise fundamental...

10.1145/3186728.3164143 article EN Proceedings of the VLDB Endowment 2017-12-01

Motivated by applications that concern graphs are evolving and massive in nature, we define a new general framework for computing with such graphs. In our framework, the graph changes over time an algorithm can only track these explicitly probing graph. This captures inherent tradeoff between complexity of maintaining up-to-date view quality results computed available view. We apply this to two classical connectivity problems, namely, path minimum spanning trees, obtain efficient algorithms.

10.1145/2090236.2090249 article EN 2012-01-08

Recent cancer sequencing studies provide a wealth of somatic mutation data from large number patients. One the most intriguing and challenging questions arising this is to determine whether temporal order mutations in follows any common progression. Since we usually obtain only one sample patient, such inferences are commonly made cross-sectional different This analysis complicated by extensive variation across patients, that reduced examining combinations various pathways. Thus far, methods...

10.1089/cmb.2014.0161 article EN Journal of Computational Biology 2015-03-18

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test widely used for this purpose, nearly all implementations of rely on an asymptotic approximation not appropriate many applications. This because: two populations determined by a variant may have very sizes; and evaluation possible demands highly accurate computation small p-values. We demonstrate problem cancer data where...

10.1371/journal.pcbi.1004071 article EN cc-by PLoS Computational Biology 2015-05-07

We present SPuManTE, an efficient algorithm for mining significant patterns from a transactional dataset. SPuManTE controls the Family-wise Error Rate: it ensures that probability of reporting one or more false discoveries is less than user-specified threshold. A key ingredient UT, our novel unconditional statistical test evaluating significance pattern, requires fewer assumptions on data generation process and appropriate knowledge discovery setting classical conditional tests, such as...

10.1145/3292500.3330978 article EN 2019-07-25

The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model the data, key data mining task with application in several areas. We present PROMISE, an algorithm for identifying sequential while guaranteeing that probability one or more false discoveries are reported output (i.e., Family-Wise Error Rate - FWER) less than user-defined threshold. PROMISE employs Westfall-Young method to correct multiple...

10.1109/icdm.2019.00169 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2019-11-01

As advances in technology allow for the collection, storage, and analysis of vast amounts data, task screening assessing significance discovered patterns is becoming a major challenge data mining applications. In this work, we address context frequent itemset mining. Specifically, develop novel methodology to identify meaningful support threshold s * dataset, such that number itemsets with at least represents substantial deviation from what would be expected random dataset same transactions...

10.1145/2220357.2220359 article EN Journal of the ACM 2012-06-01

The extraction of patterns displaying significant association with a class label is key data mining task wide application in many domains. We study variant the problem that requires to mine top-k statistically patterns, thus providing tight control on number reported output. develop TopKWY, first algorithm while rigorously controlling family-wise error rate output and provide theoretical evidence its effectiveness. TopKWY crucially relies novel strategy explore several implementation...

10.1145/3219819.3219997 article EN 2018-07-19
Coming Soon ...