- Statistical Methods and Inference
- Machine Learning in Healthcare
- Advanced Causal Inference Techniques
- Statistical Methods and Bayesian Inference
- Statistical Methods in Clinical Trials
- Genetic Associations and Epidemiology
- Meta-analysis and systematic reviews
- Privacy-Preserving Technologies in Data
- Opioid Use Disorder Treatment
- Health Systems, Economic Evaluations, Quality of Life
- Substance Abuse Treatment and Outcomes
- Pharmaceutical Economics and Policy
- COVID-19 Clinical Research Studies
- HIV, Drug Use, Sexual Risk
- Gene expression and cancer classification
- Titanium Alloys Microstructure and Properties
- Healthcare Policy and Management
- Computational Drug Discovery Methods
- Bioinformatics and Genomic Networks
- Diabetes Management and Research
- Adversarial Robustness in Machine Learning
- Pharmacovigilance and Adverse Drug Reactions
- Biomedical Text Mining and Ontologies
- Hepatitis C virus research
- HIV/AIDS Research and Interventions
Harvard University
2019-2025
Harvard University Press
2022-2025
University of Miami
2013-2024
Soochow University
2024
The First People's Hospital of Guiyang
2021-2024
Chongqing Technology and Business University
2024
University of South Florida
2024
Beijing Institute of Aeronautical Materials
2012-2023
South China University of Technology
2022-2023
Xian Yang Central Hospital
2021-2023
Abstract Objectives We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. Materials and Methods ODAL effectively utilizes the information from local site (where patient-level data are accessible) incorporates first-order (ODAL1) second-order (ODAL2) gradients of likelihood function other sites construct an estimator without requiring iterative communication or transferring data. evaluated via extensive simulation...
Abstract Objective We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials Methods Using data from single site combined with only aggregated other sites, we constructed surrogate likelihood function, approximating the partial function obtained using all By maximizing each local estimate of parameter, ODAC estimator was as weighted average...
Summary In multicentre research, individual-level data are often protected against sharing across sites. To overcome the barrier of sharing, many distributed algorithms, which only require aggregated information, have been developed. The existing algorithms usually assume homogeneously This assumption ignores important fact that collected at different sites may come from various subpopulations and environments, can lead to heterogeneity in distribution data. Ignoring erroneous statistical...
The limited representation of minorities and disadvantaged populations in large-scale clinical genomics research poses a significant barrier to translating precision medicine into practice. Prediction models are likely underperform underrepresented due heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, two-way data integration method that leverages federated transfer learning approach integrate heterogeneous from diverse...
The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, het-erogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive accuracy genetic assessments,...
Abstract Objective To address the challenges in for modeling time-to-event outcomes small-sample settings by leveraging transfer learning techniques while accounting potential covariate and concept shifts between source target datasets. Methods We propose a novel approach, termed CoxTL, data based on widely used Cox proportional hazards model. CoxTL utilizes combination of density ratio weighting importance to multi-level heterogeneity, including coefficient Additionally, it accounts model...
Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These allow practitioners to integrate them into their own contexts, reducing need extensive data labeling, training, calibration. However, selecting best model a specific target population remains challenging due issues like limited transferability, heterogeneity, difficulty obtaining true labels or outcomes in real-world settings. In...
Psychiatric disorders exhibit substantial genetic overlap, raising questions about the utility of transdiagnostic risk models. Using data from All Us Research Program (N=102,091), we evaluated common psychiatric (CPG) factor-based polygenic scores (PRSs) compared to standard disorder-specific PRSs. The CPG PRS consistently outperformed in predicting individual disorder risk, explaining 1.07 24.6 times more phenotypic variance across 11 conditions. Meanwhile, many PRSs retained independent...
<title>Abstract</title> The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, heterogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive...
Importance The COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of increase, particularly for severe cases requiring hospitalization, not well characterized. Large-scale federated informatics approaches provide ability to efficiently and securely query care data sets assess monitor hospitalization patterns conditions adolescents. Objective To estimate changes proportion hospitalizations adolescents following onset...
Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations protecting patients' privacy, sensitive individual patient (IPD) typically cannot be shared across sites. We propose an algorithm fitting distributed linear (DLMMs) without sharing IPD This achieves results identical those achieved using pooled from multiple sites (i.e., the same effect size and standard error...
Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or
Large collaborative research networks provide opportunities to jointly analyze multicenter electronic health record (EHR) data, which can improve the sample size, diversity of study population, and generalizability results. However, there are challenges analyzing EHR data including privacy protection, large-scale computation resource requirements, heterogeneity across sites, correlated observations. In this paper, we propose a federated algorithm for generalized linear mixed models...