- Statistical Methods and Inference
- Epigenetics and DNA Methylation
- Advanced Causal Inference Techniques
- Statistical Methods in Clinical Trials
- Vaccine Coverage and Hesitancy
- Statistical Methods and Bayesian Inference
- Birth, Development, and Health
- Neural Networks and Applications
- Fault Detection and Control Systems
- SARS-CoV-2 and COVID-19 Research
- Control Systems and Identification
- COVID-19 Clinical Research Studies
- Health, Environment, Cognitive Aging
- Neonatal Respiratory Health Research
- Face and Expression Recognition
- Childhood Cancer Survivors' Quality of Life
- Reservoir Engineering and Simulation Methods
- Air Quality and Health Impacts
- Acute Lymphoblastic Leukemia research
- HIV Research and Treatment
- Child Nutrition and Water Access
- Advanced Statistical Methods and Models
- Machine Learning in Healthcare
- Spaceflight effects on biology
- Mathematical Biology Tumor Growth
University of California, Berkeley
2020-2025
University of Washington
2022-2025
Fred Hutch Cancer Center
2022-2024
Epigenetic aging biomarkers are associated with increased morbidity and mortality. We evaluated if occupational exposure to three established chemical carcinogens is acceleration of epigenetic aging. studied workers in China occupationally exposed benzene, trichloroethylene (TCE) or formaldehyde by measuring personal air exposures prior blood collection. Unexposed controls matched age sex were selected from nearby factories. measured leukocyte DNA methylation (DNAm) peripheral white cells...
Emerging research suggests associations of physical and psychosocial stressors with epigenetic aging. Although this work has included early-life exposures, data on maternal exposures aging their children remain sparse. Using longitudinally collected from the California, Salinas Valley CHAMACOS study, we examined relationships between Adverse Childhood Experiences (ACEs) reported up to 18 years life, prior pregnancy, eight measures (Horvath, Hannum, SkinBloodClock, Intrinsic, Extrinsic,...
We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating estimation and procedures in causal semiparametric statistics, our enables practitioners construct valid estimators complex parameters without requiring specialized expertise. The supports Neyman-orthogonal loss functions with unknown nuisance data-driven...
Abstract Causal mediation analysis with random interventions has become an area of significant interest for understanding time-varying effects longitudinal and survival outcomes. To tackle causal statistical challenges due to the complex data structure confounders, competing risks, informative censoring, there exists a general desire combine machine learning techniques semiparametric theory. In this article, we focus on targeted maximum likelihood estimation (TMLE) natural direct indirect...
Background DNA methylation (DNAm) provides a window to characterize the impacts of environmental exposures and biological aging process. Epigenetic clocks are often trained on DNAm using penalized regression CpG sites, but recent evidence suggests potential benefits training epigenetic predictors principal components. Methodology/findings We developed pipeline simultaneously train three predictors; traditional Clock, PCA SuperLearner Clock (SL PCA). gathered publicly available datasets...
Ensuring model calibration is critical for reliable predictions, yet popular distribution-free methods, such as histogram binning and isotonic regression, provide only asymptotic guarantees. We introduce a unified framework Venn Venn-Abers calibration, generalizing Vovk's binary classification approach to arbitrary prediction tasks loss functions. leverages calibrators construct sets that contain at least one marginally perfectly calibrated point in finite samples, capturing epistemic...
Astronauts undertaking long-duration space missions may be vulnerable to unique stressors that can impact human aging. Nevertheless, few studies have examined the relationship of mission duration with DNA-methylation-based biomarkers aging in astronauts. Using data from six participants Mars-500 mission, a high-fidelity 520-day ground simulation experiment, we tested relationships five longitudinally measured blood metrics: DNAmGrimAge, DNAmPhenoAge, estimator telomere length (DNAmTL),...
Gestational age (GA) is an important determinant of child health and disease risk. Two epigenetic GA clocks have been developed using DNA methylation (DNAm) patterns in cord blood. We investigate the accuracy determinants acceleration (GAA), a biomarker biological ageing. hypothesize that prenatal birth characteristics are associated with altered GAA, thereby disrupting foetal examined 372 mother-child pairs from Center for Health Assessment Mothers Children Salinas study primarily Latino...
Abstract Background Adverse childhood experiences (ACEs) increase the risk of poor health outcomes later in life. Psychosocial stressors may also have intergenerational effects by which parental ACEs are associated with mental and physical children. Epigenetic programming be one mechanism linking to child health. This study aimed investigate epigenome-wide associations maternal preconception DNA methylation patterns In Center for Health Assessment Mothers Children Salinas study, cord blood...
Diesel exhaust (DE) is a major contributor to ambient air pollution around the world. It known human carcinogen that targets respiratory system and increases risk for many diseases, but there limited research on effects of DE exposure epigenome bronchial epithelial cells. Understanding epigenetic impact this environmental pollutant can elucidate biological mechanisms involved in pathogenesis harmful DE-related health effects. To estimate causal effect short-term epigenome, we conducted...
Abstract Identifying a biomarker or treatment-dose threshold that marks specified level of risk is an important problem, especially in clinical trials. In view this goal, we consider covariate-adjusted threshold-based interventional estimand, which happens to equal the binary treatment–specific mean estimand from causal inference literature obtained by dichotomizing continuous treatment as above below threshold. The unadjusted version was considered Donovan et al.. Expanding upon Stitelman...
Abstract Metformin and weight loss relationships with epigenetic age measures—biological aging biomarkers—remain understudied. We performed a post-hoc analysis of randomized controlled trial among overweight/obese breast cancer survivors ( N = 192) assigned to metformin, placebo, or placebo interventions for 6 months. Epigenetic was correlated chronological r 0.20–0.86; P < 0.005). However, no significant associations were observed by intervention arms. Consistent published reports in...
We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as conditional average treatment effect and relative risk. The EP-learning enjoys same oracle-efficiency Neyman-orthogonal learning strategies, DR-learning R-learning, while addressing some their primary drawbacks, including that (i) practical applicability can be hindered by loss function non-convexity; (ii) they may suffer from poor performance instability due to...
We consider the problem of estimating average treatment effect (ATE) when both randomized control trial (RCT) data and real-world (RWD) are available. decompose ATE estimand as difference between a pooled-ATE that integrates RCT RWD bias captures conditional enrollment on outcome. introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. prove A-TMLE estimator is root-n-consistent asymptotically normal. Moreover, in finite sample, it achieves...
In causal inference, many estimands of interest can be expressed as a linear functional the outcome regression function; this includes, for example, average effects static, dynamic and stochastic interventions. For learning such estimands, in work, we propose novel debiased machine estimators that are doubly robust asymptotically linear, thus providing not only consistency but also facilitating inference (e.g., confidence intervals hypothesis tests). To do so, first establish key link...
ABSTRACT Background Zika virus (ZIKV)-associated congenital microcephaly is an important contributor to pediatric death, and more robust mortality risk metrics are needed help guide life plans clinical decision making for these patients. Although common etiologies of adult differ, early health can impact outcomes—potentially through DNA methylation. Hence, in this pilot study, we take step identifying by examining associations ZIKV infection associated with existing methylation-based...
In decision-making guided by machine learning, decision-makers often take identical actions in contexts with predicted outcomes. Conformal prediction helps quantify outcome uncertainty for actions, allowing better risk management. Inspired this perspective, we introduce self-consistent conformal prediction, which yields both Venn-Abers calibrated predictions and intervals that are valid conditional on prompted model predictions. Our procedure can be applied post-hoc to any black-box...
Introduction Pegaspargase (PEG) is a key component of standard regimens for acute lymphoblastic leukemia/lymphoma (ALL) and extranodal natural killer/T-cell lymphoma (NKTCL). Emerging evidence suggests an opportunity to decrease incidence PEG-associated toxicities with dose capping, but limited. This study aims evaluate whether significant difference in related dosing strategy exists identify patient-specific or regimen-specific factors PEG-related toxicity. Methods A retrospective analysis...
Inverse weighting with an estimated propensity score is widely used by estimation methods in causal inference to adjust for confounding bias. However, directly inverting estimates can lead instability, bias, and excessive variability due large inverse weights, especially when treatment overlap limited. In this work, we propose a post-hoc calibration algorithm weights that generates well-calibrated, stabilized from user-supplied, cross-fitted estimates. Our approach employs variant of...
Causal mediation analysis with random interventions has become an area of significant interest for understanding time-varying effects longitudinal and survival outcomes. To tackle causal statistical challenges due to the complex data structure confounders, competing risks, informative censoring, there exists a general desire combine machine learning techniques semiparametric theory. In this manuscript, we focus on targeted maximum likelihood estimation (TMLE) natural direct indirect defined...
Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) target features the data distribution relies on a second order remainder being asymptotically negligible. In previous work we proposed nonparametric MLE termed Highly Adaptive Lasso (HAL) which parametrizes relevant functional in terms multivariate real valued cadlag function that is assumed to have finite variation norm. We showed HAL-MLE converges Kullback-Leibler dissimilarity at rate n-1/3 up till logn factors....
Identifying a biomarker or treatment-dose threshold that marks specified level of risk is an important problem, especially in clinical trials. This risk, viewed as function thresholds and possibly adjusted for covariates, we call the threshold-response function. Extending work Donovan, Hudgens Gilbert (2019), propose nonparametric efficient estimator covariate-adjusted function, which utilizes machine learning Targeted Minimum-Loss Estimation (TMLE). We additionally more general estimator,...