Michael Toomey

ORCID: 0000-0001-8206-6414
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cancer Genomics and Diagnostics
  • Radiomics and Machine Learning in Medical Imaging
  • AI in cancer detection
  • Genomics and Rare Diseases
  • Genetics, Bioinformatics, and Biomedical Research
  • Lung Cancer Treatments and Mutations
  • Genetic factors in colorectal cancer
  • Bioinformatics and Genomic Networks
  • Health Systems, Economic Evaluations, Quality of Life
  • Lung Cancer Diagnosis and Treatment
  • Genetic Associations and Epidemiology
  • BRCA gene mutations in cancer
  • Machine Learning in Healthcare
  • Artificial Intelligence in Healthcare
  • Microbial Community Ecology and Physiology
  • Cancer Diagnosis and Treatment
  • Scientific Computing and Data Management
  • Genomics and Phylogenetic Studies

Cornell University
2023-2024

Weill Cornell Medicine
2023-2024

Memorial Sloan Kettering Cancer Center
2024

Tri-Institutional PhD Program in Chemical Biology
2023-2024

Abstract Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, the most accurate methods not clinically feasible, relying derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use a data set 39,787 solid tumors sequenced using targeted gene panel to develop...

10.1158/2159-8290.cd-23-0996 article EN cc-by-nc-nd Cancer Discovery 2024-02-27

<p>Features leading to correct predictions of all 38 cancer types included, aggregated by broad feature category, using normalized Shapley value effect score, as previously described (Fig S6, Methods).</p>

10.1158/2159-8290.25956851 preprint EN 2024-06-03

<p>Assessment of GDD-ENS performance on the test set across reported purity values, binned in 10% increments. Count each bin (top), and corresponding overall accuracy (light pink), high-confidence (dark pink). Samples with NA values or >90% were removed, representing only 106 samples total.</p>

10.1158/2159-8290.25956857 preprint EN cc-by 2024-06-03

<p>For each broad category of features present within our training set, we trained individual models using the same regime as GDD-ENS. Results are shown across these categories, represented by a circle. We then iteratively combined and retrained models, adding feature groups in decreasing order accuracy their model, an X. The X corresponding to indicated on X-axis corresponds model all categories left it. has highest overall held-out data. CN; Copy Number.</p>

10.1158/2159-8290.25956869 preprint EN cc-by 2024-06-03

<p>Overall heatmap of top prediction across all confidence values. Heatmap is row-normalized and sorted by overall precision. Off-target values represent proportion the predicted type, row type that true along columns. NSCLC, Non-Small Cell Lung Cancer; GIST, Gastrointestinal Stromal Tumor; SQC, Squamous Carcinoma; SCLC, Small PNET, Pancreatic Neuroendocrine Lu-NET, GI-NET, Gastro-intestinal Carc., MPNST, Malignant Peripheral Nerve Sheath Tumor.</p>

10.1158/2159-8290.25956863 preprint EN cc-by 2024-06-03

<p>S11A: Expected Panel Performance (Masked Analysis), S11B: of GDD-ENS on UCSF test set. Acc: Recalibrated accuracy comparable to cohort after correction for difference in distribution cancer types within the cohort</p>

10.1158/2159-8290.25956833 preprint EN cc-by 2024-06-03

<p>Flow of results for combination prior using either metastatic site (left) or histology (right). Metastatic Site is only applied to all samples (n = 2166), histological test set examples with annotations 4571). Arrow base represents pre-adjustment category, arrow head post-adjustment. Circle indicates the number that did not change categories after adjustment, i.e. 1406 were correct and high confidence before skewing biopsy annotations.</p>

10.1158/2159-8290.25956872 preprint EN cc-by 2024-06-03

<p>Shapley values aggregated across 10 major organ systems, as described in Supp. Table S13. First column represents broad feature category importance all correct, in-distribution predictions. Second and third columns represent top features for correct incorrect predictions, respectively, regardless of or out distribution status. Number predictions within each shown bottom right figure.</p>

10.1158/2159-8290.25956848 preprint EN cc-by 2024-06-03

<p>(A) Overall proportion of all annotated ancestries across the training set (left) and testing (right). Numbers over each bar indicate total within category. EUR, European; ADM, Admixed; EAS, East Asian; AFR, African; SAS, South NAM, Native American (B) High-confidence accuracy (right) for European (EUR), Asian (EAS), African (AFR) (SAS) compared to overall high-confidence test set. P-values from a two-sided Fisher's exact comparing proportions these metrics distribution per ancestry...

10.1158/2159-8290.25956860 preprint EN cc-by 2024-06-03

<p>Shapley values significantly associated with cancer types. Shapnorm = shapley effect score normalized across all predictions of the type, stat_shap and pval_shap bonferonni corrected outputs Mann-Whitney U tests distribution Shapley value scores types vs non-cancer type predictions, shap_rank_ct rank feature association within it is (i.e. 1 top feature)</p>

10.1158/2159-8290.25956821 preprint EN cc-by 2024-06-03

<p>Heatmap of Association Scores across cancer types, defined as aggregated Shapley value effect scores signatures (A), structural variants (B), and chromosome arms (C) with the highest p-value following Mann-Whitney U-test. For (C), red associations indicate a gain in arm, while blue indicates loss. * = significant association Bonferroni-corrected P-value < .05</p>

10.1158/2159-8290.25956845 preprint EN cc-by 2024-06-03

<p>Top ten most important features leading to correct predictions of all 38 cancer types included, as approximated by normalized Shapley value effect scores. We sum absolute values per feature, indicate whether the feature was present or absent multiplying 1 -1, and each total across within type. Features that correspond same gene arm segment are grouped, top determined after aggregating genes segments.</p>

10.1158/2159-8290.25956854 preprint EN cc-by 2024-06-03

<div>Abstract<p>Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, the most accurate methods not clinically feasible, relying derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use a data set 39,787 solid tumors sequenced using targeted gene panel to...

10.1158/2159-8290.c.7265867 preprint EN 2024-06-03

<p>Flow of results for combination prior using either metastatic site (left) or histology (right). Metastatic Site is only applied to all samples (n = 2166), histological test set examples with annotations 4571). Arrow base represents pre-adjustment category, arrow head post-adjustment. Circle indicates the number that did not change categories after adjustment, i.e. 1406 were correct and high confidence before skewing biopsy annotations.</p>

10.1158/2159-8290.25956872.v1 preprint EN cc-by 2024-06-03

<p>Features leading to correct predictions of all 38 cancer types included, aggregated by broad feature category, using normalized Shapley value effect score, as previously described (Fig S6, Methods).</p>

10.1158/2159-8290.25956851.v1 preprint EN cc-by 2024-06-03

<p>For each broad category of features present within our training set, we trained individual models using the same regime as GDD-ENS. Results are shown across these categories, represented by a circle. We then iteratively combined and retrained models, adding feature groups in decreasing order accuracy their model, an X. The X corresponding to indicated on X-axis corresponds model all categories left it. has highest overall held-out data. CN; Copy Number.</p>

10.1158/2159-8290.25956869.v1 preprint EN cc-by 2024-06-03

<p>Normalized absolute Shapley value scores for all KRAS-related features across cancer types with KRAS implicated within the top ten most predictive per type. indicate presence of any broad alteration that affects a sample, while hotspot indicates alteration, and Amp reflects amplification. All other rows represent specific hotspots. Importance each differs types.</p>

10.1158/2159-8290.25956878.v1 preprint EN cc-by 2024-06-03

<p>Stepwise accuracy of various models generated during model development, including direct comparison GDD-RF to GDD-ENS. Acc. = Accuracy, Conf. Confidence, Macro Prec: Class-Averaged Precision, % Excluded: proportion high-content, solid tumor training data that is not represented by the types. RF: Random Forest. Feature Set, set and testing samples features used in original model. GDD-ENS feature updated final GDD-ENS-22 test set: but limited 22 types before expansion. GDD-RF-EXT...

10.1158/2159-8290.25956806.v1 preprint EN cc-by 2024-06-03
Coming Soon ...