- RNA and protein synthesis mechanisms
- RNA Research and Splicing
- Genomics and Chromatin Dynamics
- RNA modifications and cancer
- Chromosomal and Genetic Variations
- Machine Learning in Bioinformatics
- RNA Interference and Gene Delivery
- Genomics and Phylogenetic Studies
- Molecular Biology Techniques and Applications
- interferon and immune responses
- CRISPR and Genetic Engineering
- Evolutionary Algorithms and Applications
- Advanced biosensing and bioanalysis techniques
- MicroRNA in disease regulation
- Cancer-related molecular mechanisms research
- Single-cell and spatial transcriptomics
- Chemical Synthesis and Analysis
- Plant Molecular Biology Research
- Research in Cotton Cultivation
- Computational Drug Discovery Methods
- Viral gastroenteritis research and epidemiology
- Congenital heart defects research
- Genetic Associations and Epidemiology
- Genetic Mapping and Diversity in Plants and Animals
- Osteoarthritis Treatment and Mechanisms
Sanofi (United States)
2022-2025
University of Washington
2018-2025
Sanofi (Mexico)
2023
San Diego State University, Imperial Valley Campus
2022
Alphabet (United States)
2022
Enzo Life Sciences (United States)
2020-2021
Howard Hughes Medical Institute
2015
Massachusetts Institute of Technology
2015
Whitehead Institute for Biomedical Research
2015
The University of Texas at Austin
2008
MicroRNA targets are often recognized through pairing between the miRNA seed region and complementary sites within target mRNAs, but not all of these canonical equally effective, both computational in vivo UV-crosslinking approaches suggest that many mRNAs targeted non-canonical interactions. Here, we show recently reported do mediate repression despite binding miRNA, which indicates vast majority functional canonical. Accordingly, developed an improved quantitative model targeting, using a...
Abstract How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications human genetics depend on improved solutions. Here, we report substantially prediction accuracy from sequences through the use of deep learning architecture, called Enformer, that able to integrate information long-range interactions (up 100 kb away) genome. This improvement yielded more accurate variant effect predictions for both natural genetic...
Abstract Sequence-based machine learning models trained on genome-scale biochemical assays improve our ability to interpret genetic variants by providing functional predictions describing their impact the cis-regulatory code. Here, we introduce a new model, Borzoi, which learns predict cell- and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived Borzoi’s predicted coverage, isolate accurately score variant effects across multiple layers of regulation, including...
ABSTRACT The human genome contains millions of candidate cis -regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding the sequence features control activity these CREs. Here, used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test regulatory over 680,000 sequences, representing nearly comprehensive set all annotated CREs among three cell types (HepG2, K562, WTC11),...
Abstract The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype genotype in sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair triplet combinations, permutations orientations eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation order have major effect on activity. Corroborating these results with genomic analyses, clear human promoter...
Abstract To date, genome-wide association studies have implicated at least 35 loci in osteoarthritis but, due to linkage disequilibrium, the specific variants underlying these associations and mechanisms by which they contribute disease risk yet be pinpointed. Here, we functionally test 1,605 single nucleotide associated with for regulatory activity using a massively parallel reporter assay. We identify six polymorphisms (SNPs) differential between major minor alleles. show that most...
Abstract The next phase of genome biology research requires understanding how DNA sequence encodes phenotypes, from the molecular to organismal levels. How noncoding determines gene expression in different cell types is a major unsolved problem, and critical downstream applications human genetics depend on improved solutions. Here, we report substantially prediction accuracy through use new deep learning architecture called Enformer that able integrate long-range interactions (up 100 kb...
lternative DNA conformations, termed non-B structures, can affect transcription, but the underlying mechanisms and their functional impact have not been systematically characterized. Here, we used computational genomic analyses coupled with massively parallel reporter assays (MPRAs) to show that certain structures a substantial effect on gene expression. Genomic found at promoters harbor an excess of germline variants. Analysis multiple MPRAs, including promoter library specifically designed...
A bstract mRNA based vaccines and therapeutics are gaining popularity usage across a wide range of conditions. One the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number mRNAs. The actual have impact on several properties including expression, stability, immunogenicity, more. To enable selection optimal sequence, we developed CodonBERT, language model (LLM) for Unlike prior models, CodonBERT uses...
Abstract The human genome contains millions of candidate cis -regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states 1 . However, we lack a functional understanding the sequence features control activity these cCREs. Here used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test regulatory more than 680,000 sequences, representing an extensive set annotated cCREs among three cell types (HepG2, K562 WTC11), found 41.7%...
3' untranslated regions (3' UTRs) post-transcriptionally regulate mRNA stability, localization, and translation rate. While 3'-UTR isoforms have been globally quantified in limited cell types using bulk measurements, their differential usage among during mammalian development remains poorly characterized. In this study, we examine a dataset comprising ~2 million nuclei spanning E9.5-E13.5 of mouse embryonic to quantify transcriptome-wide changes alternative polyadenylation (APA). We observe...
mRNA-based vaccines and therapeutics are gaining popularity usage across a wide range of conditions. One the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number mRNAs. The actual mRNA have impact on several properties, including expression, stability, immunogenicity, more. To enable selection optimal sequence, we developed CodonBERT, language model (LLM) for Unlike prior models, CodonBERT uses codons...
Enhancers play an important role in morphological evolution and speciation by controlling the spatiotemporal expression of genes. Previous efforts to understand enhancers primates have typically studied many at low resolution, or single high resolution. Although comparative genomic studies reveal large-scale turnover enhancers, a specific understanding molecular steps which mammalian primate evolve remains elusive.We identified candidate hominoid-specific liver from H3K27ac ChIP-seq data....
Abstract The success of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) messenger RNA (mRNA) vaccine has led to increased interest in the design and use mRNA for vaccines therapeutics. Still, selecting most appropriate sequence a protein remains challenge. Several recent studies have shown that specific can significant impact on translation efficiency, half-life, degradation rates, other issues play major role determining efficiency. To enable selection sequence, we developed...
Abstract Measurements of gene expression and signal transduction activity are conventionally performed with methods that require either the destruction or live imaging a biological sample within timeframe interest. Here we demonstrate an alternative paradigm, termed ENGRAM ( EN hancer-driven G enomic R ecording transcriptional A ctivity in M ultiplex), which dynamics multiple reporters stably recorded to DNA. is based on prime editing-mediated insertion signal- enhancer-specific barcodes...
Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of lipids composing LNPs can have a major impact on effectiveness payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, transfection efficiency.
ABSTRACT Characterization of shared patterns RNA expression between genes across conditions has led to the discovery regulatory networks and novel biological functions. However, it is unclear if such coordination extends translation, a critical step in gene expression. Here, we uniformly analyzed 3,819 ribosome profiling datasets from 117 human 94 mouse tissues cell lines. We introduce concept Translation Efficiency Covariation (TEC), identifying coordinated translation types. nominate...