- Statistical Methods in Clinical Trials
- Gene expression and cancer classification
- Statistical Methods and Inference
- Genetic Associations and Epidemiology
- Single-cell and spatial transcriptomics
- CRISPR and Genetic Engineering
- Statistical Methods and Bayesian Inference
- Bioinformatics and Genomic Networks
- Advanced Causal Inference Techniques
- RNA modifications and cancer
- Advanced biosensing and bioanalysis techniques
- Biomedical Text Mining and Ontologies
- Genetic Mapping and Diversity in Plants and Animals
- Bayesian Methods and Mixture Models
- Machine Learning and Algorithms
- Sparse and Compressive Sensing Techniques
- Genetic and phenotypic traits in livestock
- VLSI and Analog Circuit Testing
- Numerical methods in inverse problems
- Epigenetics and DNA Methylation
- Molecular Biology Techniques and Applications
- Electron and X-Ray Spectroscopy Techniques
- Probability and Risk Models
- Advanced Electron Microscopy Techniques and Applications
- Control Systems and Identification
University of Pennsylvania
2018-2024
California University of Pennsylvania
2024
Carnegie Mellon University
2020
Stanford University
2017-2019
Princeton University
2012-2015
Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, single-cell transcriptomic proteomic sequencing, we discovered 124
Abstract In the statistical analysis of genome-wide association data, it is challenging to precisely localize variants that affect complex traits, due linkage disequilibrium, and maximize power while limiting spurious findings. Here we report on KnockoffZoom : a flexible method localizes causal at multiple resolutions by testing conditional associations genetic segments decreasing width, provably controlling false discovery rate. Our utilizes artificial genotypes as negative controls equally...
In cryo-electron microscopy (cryo-EM), a microscope generates top view of sample randomly oriented copies molecule. The problem single particle reconstruction (SPR) from cryo-EM is to use the resulting set noisy two-dimensional projection images taken at unknown directions reconstruct three-dimensional (3D) structure some situations, molecule under examination exhibits structural variability, which poses fundamental challenge in SPR. heterogeneity task mapping space conformational states It...
Abstract Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task perturb-seq analysis is test for association between a perturbation and count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of testing methods low multiplicity-of-infection (MOI) data, finding that existing produce excess false positives. an extensive empirical investigation identifying three core...
We tackle the problem of selecting from among a large number variables those that are "important" for an outcome. consider situations where groups also interest. For example, each variable might be genetic polymorphism, and we want to study how trait depends on variability in genes, segments DNA typically contain multiple such polymorphisms. In this context, discover is relevant outcome implies discovering larger entity it represents important. To guarantee meaningful results with high...
Over 1,100 independent signals have been identified with genome-wide association studies (GWAS) for bone mineral density (BMD), a key risk factor mortality-increasing fragility fractures; however, the effector gene(s) most remain unknown. Informed by variant-to-gene mapping strategy implicating 89 non-coding elements predicted to regulate osteoblast gene expression at BMD GWAS loci, we executed single-cell CRISPRi screen in human fetal 1.19 cells (hFOBs). The relevance of hFOBs was supported...
In many practical applications, it is desirable to solve the interior problem of tomography without requiring knowledge attenuation function fa on an open set within region interest (ROI). It was proved recently that has a unique solution if assumed be piecewise polynomial ROI. this paper, we tackle related question stability. well known lambda allows one stably recover locations and values jumps inside ROI from only local data. Hence, consider here case polynomial, rather than Assuming...
We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test null hypothesis that Y⫫X∣Z. The randomization was recently proposed as way to use distributional information about X∣Z exactly nonasymptotically control Type-I error using any statistic in dimensionality without assuming anything Y∣(X,Z). This flexibility, principle, allows one derive powerful statistics from complex prediction algorithms while maintaining statistical validity. Yet...
While traditional multiple testing procedures prohibit adaptive analysis choices made by users, Goeman and Solari (2011) proposed a simultaneous inference framework that allows users such flexibility while preserving high-probability bounds on the false discovery proportion (FDP) of chosen set. In this paper, we propose new class FDP bounds, tailored for nested sequences rejection sets. most existing are based closed using global null tests sorted p-values, additionally consider setting...
Abstract The majority of variants associated with complex traits and common diseases identified by genome-wide association studies (GWAS) map to noncoding regions the genome unknown regulatory effects in cis trans . By leveraging biobank-scale GWAS data, massively parallel CRISPR screens single cell transcriptome sequencing, we discovered target genes for blood trait loci. closest gene was often gene, but this not always case. We also -effects networks when encoded transcription factors,...
Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task perturb-seq analysis is test for association between a perturbation and count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of testing methods low multiplicity-of-infection (MOI) data, finding that existing produce excess false positives. an extensive empirical investigation identifying three core challenges:...
Abstract The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in GO hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At same time, sheer number concepts (>30,000) relationships (>70,000) presents challenge: can be difficult draw comprehensive picture how certain interest might relate rest ontology structure. Here we present new visualization strategies...
Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure international classification diseases (ICD), directed acyclic graph gene ontology (GO), or spatial genome-wide association studies. In context multiple testing, resulting relationships among can create redundancies rejections that hinder interpretability. This leads to practice filtering rejection sets obtained from testing procedures, which may turn invalidate their inferential...
We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test null hypothesis that is independent X Z. The randomization (CRT) was recently proposed as way to use distributional information about X|Z exactly (non-asymptotically) control Type-I error using any statistic in dimensionality without assuming anything Y|(X,Z). This flexibility principle allows one derive powerful statistics from complex prediction algorithms while maintaining...
For testing conditional independence (CI) of a response Y and predictor X given covariates Z, the model-X (MX) framework has been subject active methodological research, especially in context MX knockoffs their application to genome-wide association studies. In this paper, we study power CI tests, yielding quantitative insights into role machine learning providing evidence favor using likelihood-based statistics practice. Focusing on randomization test (CRT), find that its mode inference...
Single-cell CRISPR screens are the most promising biotechnology for mapping regulatory elements to their target genes at genome-wide scale. However, analysis of these presents significant statistical challenges. For example, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two recent high multiplicity infection single-cell how challenges cause calibration issues among existing...
Abstract Simultaneous inference allows for the exploration of data while deciding on criteria proclaiming discoveries. It was recently proved that all admissible post hoc methods true discoveries must employ closed testing. In this paper, we investigate efficient testing with local tests a special form: thresholding function sums test scores individual hypotheses. Under design, propose new statistic quantifies cost multiplicity adjustments, and develop fast (mostly linear‐time) algorithms...
Abstract We present KnockoffZoom , a flexible method for the genetic mapping of complex traits at multiple resolutions. localizes causal variants by testing conditional associations segments decreasing width while provably controlling false discovery rate using artificial genotypes as negative controls. Our is equally valid quantitative and binary phenotypes, making no assumptions about their architectures. Instead, we rely on well-established models linkage disequilibrium. demonstrate that...
Abstract The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in GO hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At same time, sheer number concepts (>30,000) relationships (>70,000) presents challenge: can be difficult draw comprehensive picture how certain interest might relate rest ontology structure. Here we present new visualization strategies...
We tackle the problem of selecting from among a large number variables those that are 'important' for an outcome. consider situations where groups also interest in their own right. For example, each variable might be genetic polymorphism and we want to study how trait depends on variability genes, segments DNA typically contain multiple such polymorphisms. Or, quantify various aspects functioning individual internet servers owned by company, interested assessing importance server as whole...
Motivated by the application of saddlepoint approximations to resampling-based statistical tests, we prove that a Lugananni-Rice style approximation for conditional tail probabilities averages conditionally independent random variables has vanishing relative error. We also provide general condition on existence and uniqueness solution corresponding equation. The results are valid under broad class distributions involving no restrictions smoothness distribution function. derived formula can...