- Genetic Associations and Epidemiology
- Genetic Mapping and Diversity in Plants and Animals
- Genetic and phenotypic traits in livestock
- Bioinformatics and Genomic Networks
- RNA modifications and cancer
- Genomics and Rare Diseases
- Evolution and Genetic Dynamics
- RNA Research and Splicing
- Chromosomal and Genetic Variations
- Gene expression and cancer classification
- Microbial Community Ecology and Physiology
- Gene Regulatory Network Analysis
- Bacterial biofilms and quorum sensing
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Diffusion and Search Dynamics
- Age of Information Optimization
- Genomics and Chromatin Dynamics
- Evolutionary Algorithms and Applications
- Mental Health Research Topics
- Evolutionary Game Theory and Cooperation
- Smart Grid Energy Management
- Epigenetics and DNA Methylation
- CRISPR and Genetic Engineering
Harvard University
2016-2023
Broad Institute
2016-2021
Harvard University Press
2020
Center for Systems Biology
2016-2017
University of Cambridge
2014
Abstract Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics 31 complex traits in East Asians (average N = 90K) Europeans 267K) an average of 0.85. determine that is 0.82× (s.e. 0.01) depleted the top quintile background selection...
Understanding the role of rare variants is important in elucidating genetic basis human disease. Negative selection can cause to have larger per-allele effect sizes than common variants. Here, we develop a method estimate minor allele frequency (MAF) dependence SNP sizes. We use model which variance proportional [p(1 - p)]α, where p MAF and negative values α imply for 25 UK Biobank diseases complex traits. All traits produce estimates, with best-fit mean -0.38 (s.e. 0.02) across Despite...
Bacteria regulate many phenotypes via quorum sensing systems. Quorum is typically thought to evolve because the regulated cooperative are only beneficial at certain cell densities. However, systems also threatened by non-cooperative "cheaters" that may exploit quorum-sensing cooperation, which begs question of how maintained in nature. Here we study evolution using an individual-based model captures natural ecology and population structuring microbial communities. We first recapitulate two...
Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency GWAS while controlling confounders. Here, we introduce a faster version of our BOLT-LMM Bayesian mixed model method— capable running analyses the full UK Biobank cohort few days on single compute node—and show that it produces highly powered, robust test statistics when run all...
Abstract Understanding the role of rare variants is important in elucidating genetic basis human diseases and complex traits. It widely believed that negative selection can cause to have larger per-allele effect sizes than common variants. Here, we develop a method estimate minor allele frequency (MAF) dependence SNP sizes. We use model which variance proportional [ p (1− )] α , where MAF values imply for by maximizing its profile likelihood linear mixed framework using imputed genotypes,...
Abstract Fine-mapping aims to identify causal variants impacting complex traits. Several recent methods improve fine-mapping accuracy by prioritizing in enriched functional annotations. However, these can only use information at genome-wide significant loci (or a small number of annotations), severely limiting the benefit data. We propose PolyFun, computationally scalable framework using data for broad set coding, conserved, regulatory and LD-related PolyFun prioritizes annotations...
Abstract Recent work has hinted at the linkage disequilibrium (LD) dependent architecture of human complex traits, where SNPs with low levels LD (LLD) have larger per-SNP heritability after conditioning on their minor allele frequency (MAF). However, this not been formally assessed, quantified or biologically interpreted. Here, we analyzed summary statistics from 56 diseases and traits (average N = 101,401) by extending stratified score regression to continuous annotations. We determined...
Complex traits and common disease are highly polygenic: thousands of variants causal, their effect sizes almost always small. Polygenicity could be explained by negative selection, which constrains common-variant may reshape distribution across the genome. We refer to this phenomenon as flattening , genetic signal is flattened relative underlying biology. introduce a mathematical definition polygenicity, effective number associated SNPs robust statistical method estimate it. This...
Abstract The genetic architecture of most human complex traits is highly polygenic, motivating efforts to detect polygenic selection involving a large number loci. In contrast previous work relying on top GWAS loci, we developed method that uses genome-wide association statistics and linkage disequilibrium patterns estimate the component population differentiation trait along continuous gradient, enabling powerful inference selection. We analyzed 43 UK Biobank focused PC1 North-South...
Abstract Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce method leverages polygenic functional enrichment incorporate coding, conserved, regulatory and LD-related genomic annotations into association analyses. We show via simulations with real genotypes method, Functionally Informed Novel Discovery Of Risk loci (FINDOR), correctly controls false-positive rate at null attains 9–38% in...
Transcription factors perform facilitated diffusion (3D in the cytosol and 1D on DNA) when binding to their target sites regulate gene expression. Here, we investigated influence of this mechanism noise Our results showed that, for biologically relevant parameters, process can be represented by a two-state Markov model that accelerated finding due leads reduction both mRNA protein noise.
Abstract Many diseases and complex traits exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting polygenic risk prediction. We developed a new method, S-LDXR, for stratifying squared correlation across genomic annotations, applied S-LDXR to genome-wide association summary statistics 31 in East Asians (EAS) Europeans (EUR) (average N EAS =90K, EUR =267K) an average of 0.85 (s.e. 0.01). determined that was 0.82× 0.01) smaller...
Abstract There is increasing evidence that many GWAS risk loci are molecular QTL for gene ex-pression (eQTL), histone modification (hQTL), splicing (sQTL), and/or DNA methylation (meQTL). Here, we introduce a new set of functional annotations based on causal posterior prob-abilities (CPP) fine-mapped cis-QTL, using data from the GTEx and BLUEPRINT consortia. We show these very strongly enriched disease heritability across 41 independent diseases complex traits (average N = 320K): 5.84x eQTL,...
Abstract Most models of complex trait genetic architecture assume that signed causal effect sizes each SNP (defined with respect to the minor allele) are uncorrelated those nearby SNPs, but it is currently unknown whether this case. We develop a new method, autocorrelation LD regression (ACLR), for estimating genome-wide allele as function genomic distance. Our method estimates these autocorrelations by regressing products summary statistics on distance-dependent scores. determined ACLR...
Abstract The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship causal disease effect sizes between proximal SNPs, which have largely assumed to be independent. We introduce a new method, LD SNP-pair correlation regression (LDSPEC), estimate derived alleles depending on their allele frequencies, LD, functional annotations; LDSPEC produced robust estimates in simulations across various architectures. applied 70...
Abstract Common variant heritability is known to be concentrated in variants within cell-type-specific non-coding functional annotations, with a limited role for common coding variants. However, little about the distribution of low-frequency heritability. Here, we partitioned both (0.5% ≤ MAF < 5%) and (MAF ≥ 40 UK Biobank traits (average N = 363K) across broad set employing an extension stratified LD score regression that produces robust results simulations. We determined non-synonymous...