- Cancer Genomics and Diagnostics
- Genomics and Rare Diseases
- RNA modifications and cancer
- Genetic factors in colorectal cancer
- Epigenetics and DNA Methylation
- Distributed and Parallel Computing Systems
- Scientific Computing and Data Management
- Gene expression and cancer classification
- Genetics, Bioinformatics, and Biomedical Research
- Childhood Cancer Survivors' Quality of Life
- Cancer-related molecular mechanisms research
- Molecular Biology Techniques and Applications
- Prenatal Screening and Diagnostics
- Advanced Data Storage Technologies
- Genomics and Phylogenetic Studies
- Neuroblastoma Research and Treatments
- Acute Lymphoblastic Leukemia research
- Single-cell and spatial transcriptomics
- Iron Metabolism and Disorders
- Lung Cancer Research Studies
- DNA Repair Mechanisms
- Hemoglobinopathies and Related Disorders
- Gene Regulatory Network Analysis
- Bioinformatics and Genomic Networks
- Cloud Computing and Resource Management
St. Jude Children's Research Hospital
2016-2025
Juno Therapeutics (Germany)
2018
Pfizer (United Kingdom)
2018
Gilead Sciences (Germany)
2018
Medtronic (United States)
2018
Cytokinetics (United States)
2018
Incyte (United States)
2018
Alpine Immune Sciences (United States)
2018
University of Notre Dame
2010-2014
Abstract To evaluate the potential of an integrated clinical test to detect diverse classes somatic and germline mutations relevant pediatric oncology, we performed three-platform whole-genome (WGS), whole exome (WES) transcriptome (RNA-Seq) sequencing tumors normal tissue from 78 cancer patients in a CLIA-certified, CAP-accredited laboratory. Our analysis pipeline achieves high accuracy by cross-validating variants between types, thereby removing need for confirmatory testing, facilitates...
Abstract Effective data sharing is key to accelerating research improve diagnostic precision, treatment efficacy, and long-term survival in pediatric cancer other childhood catastrophic diseases. We present St. Jude Cloud (https://www.stjude.cloud), a cloud-based data-sharing ecosystem for accessing, analyzing, visualizing genomic from >10,000 patients with survivors, >800 sickle cell patients. Harmonized totaling 1.25 petabytes are freely available, including 12,104 whole...
Purpose Childhood cancer survivors are at increased risk of subsequent neoplasms (SNs), but the germline genetic contribution is largely unknown. We assessed pathogenic/likely pathogenic (P/LP) mutations in predisposition genes to their SN risk. Patients and Methods Whole-genome sequencing (30-fold) was performed on samples from childhood who were ≥ 5 years since initial diagnosis participants St Jude Lifetime Cohort Study, a retrospective hospital-based study with prospective clinical...
Abstract To discover driver fusions beyond canonical exon-to-exon chimeric transcripts, we develop CICERO, a local assembly-based algorithm that integrates RNA-seq read support with extensive annotation for candidate ranking. CICERO outperforms commonly used methods, achieving 95% detection rate 184 independently validated including internal tandem duplications and other non-canonical events in 170 pediatric cancer transcriptomes. Re-analysis of TCGA glioblastoma unveils previously...
Genomic studies of pediatric cancer have primarily focused on specific tumor types or high-risk disease. Here, we used a three-platform sequencing approach, including whole-genome (WGS), whole-exome (WES), and RNA (RNA-seq), to examine germline genomes from 309 prospectively identified children with newly diagnosed (85%) relapsed/refractory (15%) cancers, unselected for type. Eighty-six percent patients harbored diagnostic (53%), prognostic (57%), therapeutically relevant (25%), and/or...
Many signaling and other genes known as "hidden" drivers may not be genetically or epigenetically altered differentially expressed at the mRNA protein levels, but, rather, drive a phenotype such tumorigenesis via post-translational modification mechanisms. However, conventional approaches based on genomics differential expression are limited in exposing hidden drivers. Here, we present comprehensive algorithm toolkit NetBID2 (data-driven network-based Bayesian inference of drivers, version...
Abstract Knowledge about molecular targets for pediatric cancer has accelerated exponentially in recent years thanks to the increased application of multi-omics profiling both research and clinical settings. The efficacy genomic-based interventions may depend on whether observed genomic abnormalities are fundamental pathogenesis a specific subtype. At present such information is limited due rapid evolution subtype discovery classification as well disease heterogeneity owing presence many...
Individuals with monogenic disorders can experience variable phenotypes that are influenced by genetic variation. To investigate this in sickle cell disease (SCD), we performed whole-genome sequencing (WGS) of 722 individuals hemoglobin HbSS or HbSβ0-thalassemia from Baylor College Medicine and the St. Jude Children's Research Hospital Sickle Cell Clinical Intervention Program (SCCRIP) longitudinal cohort study. We developed pipelines to identify variants modulate polymerization red blood...
Bioinformatics researchers need efficient means to process large collections of genomic sequence data. One application interest, genome assembly, has great potential for parallelization; however, most previous attempts at parallelization require uncommon high-end hardware. This paper introduces the Scalable Assembler Notre Dame (SAND) framework that can achieve significant speedup using numbers commodity machines harnessed from clusters, clouds, and grids. SAND interfaces with Celera...
In this paper we discuss challenges of common bioinformatics applications when deployed outside their initial development environments. We propose a three-tiered approach to mitigate some these issues by leveraging an encapsulation tool, high-level workflow language, and portable intermediary. As case study, apply refactor custom EST analysis pipeline. The Starch tool encapsulates program dependencies simplify task specification deployment. Weaver language provides abstractions for...
ABSTRACT Effective data sharing is key to accelerating research that will improve the precision of diagnoses, efficacy treatments and long-term survival pediatric cancer other childhood catastrophic diseases. We present St. Jude Cloud ( https://www.stjude.cloud ), a cloud-based ecosystem developed via collaboration between Children’s Research Hospital, DNAnexus, Microsoft, for accessing, analyzing visualizing genomic from >10,000 patients, survivors >800 sickle cell patients....
Next generation sequencing technologies have enabled many genomes. Because of the overall increasing demand and inherent parallelism available in required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves speed-up 45x using 50 workers Caenorhabditis japonica test case. also evaluate modifications within Amazon EC2 cloud framework. The underlying genome (MAKER) is parallelised as an MPI...
Abstract Although genome-wide DNA methylomes have demonstrated their clinical value as reliable biomarkers for tumor detection, subtyping, and classification, direct biological impacts at the individual gene level remain elusive. Here we present MethylationToActivity (M2A), a machine learning framework that uses convolutional neural networks to infer promoter activities based on H3K4me3 H3K27ac enrichment, from methylation patterns genes. Using publicly available datasets in real-world test...
Clusters, clouds, and grids offer access to large scale computational resources at low cost. This is especially appealing scientific applications that require a very compete in the research space. However, available across these platforms differ significantly their availability, hardware, environment, performance, cost of use, more. requires use elastic can adapt run-time, transparently handling heterogeneity failures. In this paper, we present case studies several built using Work Queue...
SUMMARY Weaver is a high‐level distributed computing framework that enables researchers to construct scalable scientific data‐processing workflows. Instead of developing new workflow language, we introduce domain‐specific language built on top Python called Weaver, which takes advantage users' familiarity with the programming minimizes barriers adoption, and allows for integration rich ecosystem existing software. In this paper, provide an overview Weaver's model, users organize specify...
Next generation sequencing technologies have enabled various entities, ranging from large centers to individual laboratories, sequence organisms of choice and analyze them on demand. Sequencing analysis, however, is only part the equation: learn about a certain organism, scientists need annotate it. Each these problems highly parallel at basic level computation; few applications support single parallelization frameworks such as MPI. Because overall increasing demand for computational...
ABSTRACT Summary Xenografts are important models for cancer research and the presence of mouse reads in xenograft next generation sequencing data can potentially confound interpretation experimental results. We present an efficient, cloud-based BAM-to-BAM cleaning tool called XenoCP to remove from BAM files. show application obtaining accurate gene expression quantification RNA-seq tumor heterogeneity WGS xenografts derived brain solid tumors. Availability Implementation St. Jude Cloud (...
<div>Abstract<p>Genomic studies of pediatric cancer have primarily focused on specific tumor types or high-risk disease. Here, we used a three-platform sequencing approach, including whole-genome (WGS), whole-exome (WES), and RNA (RNA-seq), to examine germline genomes from 309 prospectively identified children with newly diagnosed (85%) relapsed/refractory (15%) cancers, unselected for type. Eighty-six percent patients harbored diagnostic (53%), prognostic (57%), therapeutically...
Abstract High-throughput DNA sequencing technologies have enabled unbiased screening of genomic alterations such as single nucleotide variants (SNVs) and small insertions/deletions (indels). However, variant analysis using transcriptomic (RNA-seq) data has not become standard due to challenges in distinguishing true variants, particular indels, from artifacts that can arise RNA-seq mapping library preparation. We previously developed a tool, RNAIndel, which classifies indels somatic,...
Abstract Childhood cancer survivors are at increased risk of subsequent neoplasms (SN), largely considered to be therapy-related. Studies predisposition genes (CPGs) and SN among long-term lacking. We characterized germline mutations in CPGs childhood determine their contribution risk. Whole genome (30x) exome (100x) sequencing was performed for 2988 5+ year (1629 leukemia/lymphoma, 332 CNS, 1027 other solid tumors, 53% male, median follow-up 28 [range 6-55] years). Survivors underwent a...
The empirical aim of this paper is motivated by the anecdotal belief among professional and non-professional investment community, that a “low” reading in CBOE Volatility Index (VIX) or large decline alone are ample reasons to believe volatility will spike near future. While can be useful tool for investors traders, it often misinterpreted poorly used. This demonstrate dispersion acts as better predictor its future VIX spikes.
Abstract Gene fusions are important biomarkers for cancer diagnosis, subtype classification and therapeutic decision-making. While fusion detection using RNA-seq data has become a standard practice, existing computational methods primarily focus on identifying canonical exon-to-exon fusions. However, more complex events such as multi-partner fusions, truncations, enhancer hijacking internal tandem duplications (ITD) can also lead to abnormal function or aberrant transcription of driver...