- Genomics and Chromatin Dynamics
- Genomics and Rare Diseases
- Bioinformatics and Genomic Networks
- Gene expression and cancer classification
- Genetic Associations and Epidemiology
- Machine Learning in Bioinformatics
- Genomics and Phylogenetic Studies
- Epigenetics and DNA Methylation
- Nonmelanoma Skin Cancer Studies
- Nutrition, Genetics, and Disease
- Genomic variations and chromosomal abnormalities
- RNA Research and Splicing
- Birth, Development, and Health
- RNA and protein synthesis mechanisms
- Renal and related cancers
- Cancer Genomics and Diagnostics
- Advanced Proteomics Techniques and Applications
- Muscle Physiology and Disorders
- Cutaneous lymphoproliferative disorders research
- RNA modifications and cancer
- Immunotherapy and Immune Responses
- Genetic and Kidney Cyst Diseases
- Genetics, Aging, and Longevity in Model Organisms
- Adipose Tissue and Metabolism
- Vector-Borne Animal Diseases
Chan Zuckerberg Initiative (United States)
2022-2025
University of California, Berkeley
2020-2024
Berkeley College
2024
Stanford University
2016-2020
Jain Foundation
2018-2019
Abstract Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity missense variants necessary evaluate their clinical research utility guide future improvements. The Critical Assessment Genome Interpretation (CAGI) conducts ongoing Annotate-All-Missense (Missense Marathon) challenge, in which variant effect predictors (also called impact predictors) evaluated on added disease-relevant databases following prediction submission...
Age is the primary risk factor for many common human diseases. Here, we quantify relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that predictive power quantitative trait loci impacted by age in tissues. Jointly modelling transcript level variation find heritability (h2) consistent among while contribution varies >20-fold with [Formula: see text] 5 force purifying selection stronger on genes expressed early versus late life...
Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current perform well at predicting across genes in different cell types the reference genome, their ability to explain variation between individuals due cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art on paired personal genome transcriptome data find limited performance when explaining individuals. In...
Genetic variation in the human genome is a major determinant of individual disease risk, but vast majority missense variants have unknown etiological effects. Here, we present robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors proteome-wide variant pathogenicity.We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on...
Abstract Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current perform well at predicting across genes in different cell types the reference genome, their ability to explain variation between individuals due cis-regulatory genetic variants remains largely unexplored. Here we evaluate four state-of-the-art on paired personal genome transcriptome data find limited performance when explaining individuals.
Abstract Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity missense variants is necessary evaluate their clinical research utility suggest directions for future improvement. Here, as part sixth edition Critical Assessment Genome Interpretation (CAGI) challenge, we assess variant effect predictors (or impact predictors) on an evaluation dataset rare from disease-relevant databases. Our evaluates submitted CAGI6 Annotate-All-Missense...
Abstract Background A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction the genome. Furthermore, cell type-specific CREs contain large proportion complex disease heritability. Results We evaluate genomic regions with varying degrees type...
Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal analysis. One mechanism by which single nucleotide variants (SNVs) influence downstream phenotypes through regulation gene expression. Methods to predict whether or not individual SNVs are likely regulate expression would aid interpretation unknown significance identified whole-genome sequencing studies.We developed FIRE (Functional Inference Regulators Expression), a tool score both and...
Deep learning models in genomics that predict molecular phenotypes from DNA sequence traditionally focus on one-hot encoded representations. Here, we develop a novel model extends this approach by incorporating structural attributes indicative of local shape alongside canonical inputs. This augmentation provides an additional axis for interpretability and aids identifying regulatory patterns not apparent alone. Applying to prediction transcription factor binding (ChIP-seq) demonstrates...
Genomic sequence-to-expression deep learning models, which are trained to predict gene expression and other molecular phenotypes across the reference genome, have recently been shown poor out-of-the-box performance in predicting variation individuals based on their personal genome sequences. Here we explore whether additional training (fine-tuning) paired transcriptome data improves of such models. Using Enformer as a representative pre-trained model, various fine-tuning strategies. Our...
Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute 6891 cSCC cases and 54,566 controls the Kaiser Permanente Genetic Epidemiology Research Adult Health Aging (GERA) cohort 25,558 self-reported 673,788 from 23andMe. In discovery-validation study, 19 containing...
Kidney disease is highly heritable; however, the causal genetic variants, cell types in which these variants function, and molecular mechanisms underlying kidney remain largely unknown. To identify loci affecting we performed a GWAS using multiple function biomarkers identified 462 loci. begin to investigate how affect generated single-cell chromatin accessibility (scATAC-seq) maps of human candidate
Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo disease-associated cells. One limitation their widespread use is lack of short regulatory sequences, or promoters, that differentially induce expression delivered in target cells, minimizing side effects other cell types. Such cell-type-specific promoters are difficult discover using existing methods, requiring either manual curation access large datasets promoter-driven from both targeted and...
The ability to deliver genetic cargo human cells is enabling rapid progress in molecular medicine, but designing this for precise expression specific cell types a major challenge. Expression driven by regulatory DNA sequences within short synthetic promoters, relatively few of these promoters are cell-type-specific. design cell-type-specific using model-based optimization would be impactful research and therapeutic applications. However, models from (promoter-driven expression) lacking most...
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights need rigorous model specification controlled evaluation, problems familiar to other fields AI. Research strategies that have greatly benefited -- including benchmarking, auditing, algorithmic fairness --- are also needed advance field genomic AI facilitate development. Here we propose a benchmark, GUANinE, evaluating...
Abstract Genetic variation in the human genome is a major determinant of individual disease risk, but vast majority missense variants have unknown etiological effects. Here, we present robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors proteome-wide variant pathogenicity. We train cross-protein transfer (CPT) models using deep mutational scanning data from only five proteins and achieve state-of-the-art performance on...
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications genome sequencing for diagnosis and personalized care. Non-coding remain particularly difficult to interpret, despite making up a large majority trait associations identified genome-wide association studies (GWAS) analyses. Predicting the regulatory effects non-coding on candidate genes key step evaluating their significance. Here, we develop machine-learning algorithm, Inference Connected...