- Statistical Methods and Inference
- Statistical Methods and Bayesian Inference
- Statistical Methods in Clinical Trials
- Advanced Causal Inference Techniques
- Bayesian Methods and Mixture Models
- Health Systems, Economic Evaluations, Quality of Life
- Gene expression and cancer classification
- Cancer Genomics and Diagnostics
- Cancer Cells and Metastasis
- Machine Learning and Data Classification
- Optimal Experimental Design Methods
- Single-cell and spatial transcriptomics
- Meta-analysis and systematic reviews
- Advanced Statistical Process Monitoring
- Machine Learning and Algorithms
- Scientific Computing and Data Management
- Healthcare cost, quality, practices
- Gaussian Processes and Bayesian Inference
- Viral Infectious Diseases and Gene Expression in Insects
- Advanced Text Analysis Techniques
- Control Systems and Identification
- Molecular Biology Techniques and Applications
- Dam Engineering and Safety
- Evolution and Genetic Dynamics
- Probabilistic and Robust Engineering Design
University of Chicago
2023-2025
Stanford University
2019-2022
European Molecular Biology Laboratory
2016
European Bioinformatics Institute
2015
Heidelberg University
2014
Heidelberg University
2014
Abstract Elucidating the spectrum of epithelial-mesenchymal transition (EMT) and mesenchymal-epithelial (MET) states in clinical samples promises insights on cancer progression drug resistance. Using mass cytometry time-course analysis, we resolve lung EMT through TGFβ-treatment identify, TGFβ-withdrawal, a distinct MET state. We demonstrate significant differences between trajectories using computational tool (TRACER) for reconstructing cell states. In addition, construct reference map...
A fundamental task in the analysis of datasets with many variables is screening for associations. This can be cast as a multiple testing task, where objective achieving high detection power while controlling type I error. We consider $m$ hypothesis tests represented by pairs $((P_i, X_i))_{1\leq i \leq m}$ p-values $P_i$ and covariates $X_i$, such that $P_i \perp X_i$ if $H_i$ null. Here, we show how to use information potentially available about heterogeneities among hypotheses increase...
Production of indigoidine can be enhanced by swapping a synthetic T domain into the NRPS IndC.
ABSTRACT Elucidating a continuum of epithelial-mesenchymal transition (EMT) and mesenchymal-epithelial (MET) states in clinical samples promises new insights cancer progression drug response. Using mass cytometry time-course analysis, we resolve lung EMT through TGFβ-treatment identify TGFβ-withdrawal, an MET state previously unrealized. We demonstrate significant differences between trajectories using novel computational tool (TRACER) for reconstructing cell states. Additionally, construct...
A growing body of work uses the paradigm algorithmic fairness to frame development techniques anticipate and proactively mitigate introduction or exacerbation health inequities that may follow from use model-guided decision-making. We evaluate interplay between measures model performance, fairness, expected utility decision-making offer practical recommendations for operationalization principles evaluation predictive models in healthcare. conduct an empirical case-study via estimate ten-year...
Abstract Regression discontinuity designs assess causal effects in settings where treatment is determined by whether an observed running variable crosses a prespecified threshold. Here we propose new approach to identification, estimation, and inference regression that uses knowledge about exogenous noise (e.g., measurement error) the variable. In our strategy, weight treated control units balance latent of which noisy measure. Our driven effective randomization provided variable,...
Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits individual tasks, modern applications require answering numerous parallel questions. We introduce Adaptive Shrinkage (PAS), method that bridges PPI empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task...
Summary We study how to combine p-values and e-values, design multiple testing procedures where both e-values are available for every hypothesis. Our results provide a new perspective on with data-driven weights: while standard weighted methods require the weights deterministically add up number of hypotheses being tested, we show that this normalization is not required when independent p-values. Such can be obtained in meta-analysis primary dataset used compute p-values, an secondary...
Abstract Hypothesis weighting is a powerful approach for improving the power of data analyses that employ multiple testing. However, in general it not evident how to choose weights data-dependent manner. We describe independent hypothesis (IHW), method making use informative covariates are test statistic under null, but each test’s or prior probability null hypothesis. Covariates can be continuous categorical and need fulfill any particular assumptions. The increases statistical applications...
Formulae display:?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax order to improve their display. Uncheck the box turn off. This feature requires Javascript. Click on a formula zoom.
In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization when and why point estimates accurately recover behavior. this paper, develop flexible practical confidence intervals that asymptotic frequentist coverage estimands, such as posterior mean or local false sign rate. The statements hold even estimands are only...
We study methods for simultaneous analysis of many noisy experiments in the presence rich covariate information. The goal analyst is to optimally estimate true effect underlying each experiment. Both experimental results and auxiliary covariates are useful this purpose, but neither data source on its own captures all information available analyst. In paper, we propose a flexible plug-in empirical Bayes estimator that synthesizes both sources may leverage any black-box predictive model. show...
Regression discontinuity designs assess causal effects in settings where treatment is determined by whether an observed running variable crosses a pre-specified threshold. Here we propose new approach to identification, estimation, and inference regression that uses knowledge about exogenous noise (e.g., measurement error) the variable. In our strategy, weight treated control units balance latent of which noisy measure. Our explicitly randomization-based complements standard formal analyses...
Abstract Background Non-experimental studies (also known as observational studies) are valuable for estimating the effects of various medical interventions, but notoriously difficult to evaluate because methods used in non-experimental require untestable assumptions. This lack intrinsic verifiability makes it both compare different study and trust results any particular study. Methods We introduce TrialProbe , a data resource statistical framework evaluation methods. first collect dataset...
In large-scale studies with parallel signal-plus-noise observations, the local false discovery rate is a summary statistic that often presumed to be equal posterior probability signal null. We prefer call latter quantity null-signal emphasize our view null and are not identical events. The commonly estimated through empirical Bayes procedures build on `zero density assumption', which attributes of observations near zero entirely signals. this paper, we argue strategy does furnish estimates...
Abstract Despite the growing concerns about replicability of ecological and evolutionary studies, no results exist from a field-wide replication project. We conduct large-scale in silico project, leveraging cutting-edge statistical methodologies. Replicability is 30%–40% for studies with marginal significance absence selective reporting, whereas presenting ‘strong’ evidence against null hypothesis H 0 >70%. The former requires sevenfold larger sample size to reach latter’s replicability....
We demonstrate how data fission, a method for creating synthetic replicates from single observations, can be applied to empirical Bayes estimation. This extends recent work on with multiple the classical single-replicate setting. The key insight is that after estimation cast as general regression problem.