Eugene Katsevich

ORCID: 0000-0003-0598-2050
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods in Clinical Trials
  • Gene expression and cancer classification
  • Statistical Methods and Inference
  • Genetic Associations and Epidemiology
  • Single-cell and spatial transcriptomics
  • CRISPR and Genetic Engineering
  • Statistical Methods and Bayesian Inference
  • Bioinformatics and Genomic Networks
  • Advanced Causal Inference Techniques
  • RNA modifications and cancer
  • Advanced biosensing and bioanalysis techniques
  • Biomedical Text Mining and Ontologies
  • Genetic Mapping and Diversity in Plants and Animals
  • Bayesian Methods and Mixture Models
  • Machine Learning and Algorithms
  • Sparse and Compressive Sensing Techniques
  • Genetic and phenotypic traits in livestock
  • VLSI and Analog Circuit Testing
  • Numerical methods in inverse problems
  • Epigenetics and DNA Methylation
  • Molecular Biology Techniques and Applications
  • Electron and X-Ray Spectroscopy Techniques
  • Probability and Risk Models
  • Advanced Electron Microscopy Techniques and Applications
  • Control Systems and Identification

University of Pennsylvania
2018-2024

California University of Pennsylvania
2024

Carnegie Mellon University
2020

Stanford University
2017-2019

Princeton University
2012-2015

Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, single-cell transcriptomic proteomic sequencing, we discovered 124

10.1126/science.adh7699 article EN Science 2023-05-04

Abstract In the statistical analysis of genome-wide association data, it is challenging to precisely localize variants that affect complex traits, due linkage disequilibrium, and maximize power while limiting spurious findings. Here we report on KnockoffZoom : a flexible method localizes causal at multiple resolutions by testing conditional associations genetic segments decreasing width, provably controlling false discovery rate. Our utilizes artificial genotypes as negative controls equally...

10.1038/s41467-020-14791-2 article EN cc-by Nature Communications 2020-02-27

In cryo-electron microscopy (cryo-EM), a microscope generates top view of sample randomly oriented copies molecule. The problem single particle reconstruction (SPR) from cryo-EM is to use the resulting set noisy two-dimensional projection images taken at unknown directions reconstruct three-dimensional (3D) structure some situations, molecule under examination exhibits structural variability, which poses fundamental challenge in SPR. heterogeneity task mapping space conformational states It...

10.1137/130935434 article EN SIAM Journal on Imaging Sciences 2015-01-01

Abstract Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task perturb-seq analysis is test for association between a perturbation and count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of testing methods low multiplicity-of-infection (MOI) data, finding that existing produce excess false positives. an extensive empirical investigation identifying three core...

10.1186/s13059-024-03254-2 article EN cc-by Genome biology 2024-05-17

We tackle the problem of selecting from among a large number variables those that are "important" for an outcome. consider situations where groups also interest. For example, each variable might be genetic polymorphism, and we want to study how trait depends on variability in genes, segments DNA typically contain multiple such polymorphisms. In this context, discover is relevant outcome implies discovering larger entity it represents important. To guarantee meaningful results with high...

10.1214/18-aoas1185 article EN other-oa The Annals of Applied Statistics 2019-03-01

Over 1,100 independent signals have been identified with genome-wide association studies (GWAS) for bone mineral density (BMD), a key risk factor mortality-increasing fragility fractures; however, the effector gene(s) most remain unknown. Informed by variant-to-gene mapping strategy implicating 89 non-coding elements predicted to regulate osteoblast gene expression at BMD GWAS loci, we executed single-cell CRISPRi screen in human fetal 1.19 cells (hFOBs). The relevance of hFOBs was supported...

10.1101/2024.03.19.585778 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-03-20

In many practical applications, it is desirable to solve the interior problem of tomography without requiring knowledge attenuation function fa on an open set within region interest (ROI). It was proved recently that has a unique solution if assumed be piecewise polynomial ROI. this paper, we tackle related question stability. well known lambda allows one stably recover locations and values jumps inside ROI from only local data. Hence, consider here case polynomial, rather than Assuming...

10.1088/0266-5611/28/6/065022 article EN Inverse Problems 2012-05-31

We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test null hypothesis that Y⫫X∣Z. The randomization was recently proposed as way to use distributional information about X∣Z exactly nonasymptotically control Type-I error using any statistic in dimensionality without assuming anything Y∣(X,Z). This flexibility, principle, allows one derive powerful statistics from complex prediction algorithms while maintaining statistical validity. Yet...

10.1093/biomet/asab039 article EN Biometrika 2021-07-02

While traditional multiple testing procedures prohibit adaptive analysis choices made by users, Goeman and Solari (2011) proposed a simultaneous inference framework that allows users such flexibility while preserving high-probability bounds on the false discovery proportion (FDP) of chosen set. In this paper, we propose new class FDP bounds, tailored for nested sequences rejection sets. most existing are based closed using global null tests sorted p-values, additionally consider setting...

10.1214/19-aos1938 article EN The Annals of Statistics 2020-12-01

Abstract The majority of variants associated with complex traits and common diseases identified by genome-wide association studies (GWAS) map to noncoding regions the genome unknown regulatory effects in cis trans . By leveraging biobank-scale GWAS data, massively parallel CRISPR screens single cell transcriptome sequencing, we discovered target genes for blood trait loci. closest gene was often gene, but this not always case. We also -effects networks when encoded transcription factors,...

10.1101/2021.04.07.438882 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2021-04-08

Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task perturb-seq analysis is test for association between a perturbation and count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of testing methods low multiplicity-of-infection (MOI) data, finding that existing produce excess false positives. an extensive empirical investigation identifying three core challenges:...

10.1101/2023.05.15.540875 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2023-05-15

Abstract The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in GO hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At same time, sheer number concepts (>30,000) relationships (>70,000) presents challenge: can be difficult draw comprehensive picture how certain interest might relate rest ontology structure. Here we present new visualization strategies...

10.1038/s41598-019-42178-x article EN cc-by Scientific Reports 2019-05-24

Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure international classification diseases (ICD), directed acyclic graph gene ontology (GO), or spatial genome-wide association studies. In context multiple testing, resulting relationships among can create redundancies rejections that hinder interpretability. This leads to practice filtering rejection sets obtained from testing procedures, which may turn invalidate their inferential...

10.1080/01621459.2021.1920958 article EN cc-by-nc-nd Journal of the American Statistical Association 2021-05-05

We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test null hypothesis that is independent X Z. The randomization (CRT) was recently proposed as way to use distributional information about X|Z exactly (non-asymptotically) control Type-I error using any statistic in dimensionality without assuming anything Y|(X,Z). This flexibility principle allows one derive powerful statistics from complex prediction algorithms while maintaining...

10.48550/arxiv.2006.03980 preprint EN other-oa arXiv (Cornell University) 2020-01-01

For testing conditional independence (CI) of a response Y and predictor X given covariates Z, the model-X (MX) framework has been subject active methodological research, especially in context MX knockoffs their application to genome-wide association studies. In this paper, we study power CI tests, yielding quantitative insights into role machine learning providing evidence favor using likelihood-based statistics practice. Focusing on randomization test (CRT), find that its mode inference...

10.1214/22-ejs2085 article EN cc-by Electronic Journal of Statistics 2022-01-01

Single-cell CRISPR screens are the most promising biotechnology for mapping regulatory elements to their target genes at genome-wide scale. However, analysis of these presents significant statistical challenges. For example, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two recent high multiplicity infection single-cell how challenges cause calibration issues among existing...

10.1101/2020.08.13.250092 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-08-14

Abstract Simultaneous inference allows for the exploration of data while deciding on criteria proclaiming discoveries. It was recently proved that all admissible post hoc methods true discoveries must employ closed testing. In this paper, we investigate efficient testing with local tests a special form: thresholding function sums test scores individual hypotheses. Under design, propose new statistic quantifies cost multiplicity adjustments, and develop fast (mostly linear‐time) algorithms...

10.1111/sjos.12614 article EN Scandinavian Journal of Statistics 2022-09-06

Abstract We present KnockoffZoom , a flexible method for the genetic mapping of complex traits at multiple resolutions. localizes causal variants by testing conditional associations segments decreasing width while provably controlling false discovery rate using artificial genotypes as negative controls. Our is equally valid quantitative and binary phenotypes, making no assumptions about their architectures. Instead, we rely on well-established models linkage disequilibrium. demonstrate that...

10.1101/631390 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-05-08

Abstract The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in GO hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At same time, sheer number concepts (>30,000) relationships (>70,000) presents challenge: can be difficult draw comprehensive picture how certain interest might relate rest ontology structure. Here we present new visualization strategies...

10.1101/436741 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-10-05

We tackle the problem of selecting from among a large number variables those that are 'important' for an outcome. consider situations where groups also interest in their own right. For example, each variable might be genetic polymorphism and we want to study how trait depends on variability genes, segments DNA typically contain multiple such polymorphisms. Or, quantify various aspects functioning individual internet servers owned by company, interested assessing importance server as whole...

10.48550/arxiv.1706.09375 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Motivated by the application of saddlepoint approximations to resampling-based statistical tests, we prove that a Lugananni-Rice style approximation for conditional tail probabilities averages conditionally independent random variables has vanishing relative error. We also provide general condition on existence and uniqueness solution corresponding equation. The results are valid under broad class distributions involving no restrictions smoothness distribution function. derived formula can...

10.48550/arxiv.2407.08915 preprint EN arXiv (Cornell University) 2024-07-11
Coming Soon ...