NFDI4DS | UHH-SEMS - Publication Details

Rui Duan

ORCID: 0000-0002-9261-4864

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5089178051

Research Areas

Statistical Methods and Inference
Machine Learning in Healthcare
Advanced Causal Inference Techniques
Statistical Methods and Bayesian Inference
Statistical Methods in Clinical Trials
Genetic Associations and Epidemiology
Meta-analysis and systematic reviews
Privacy-Preserving Technologies in Data
Opioid Use Disorder Treatment
Health Systems, Economic Evaluations, Quality of Life
Substance Abuse Treatment and Outcomes
Pharmaceutical Economics and Policy
COVID-19 Clinical Research Studies
HIV, Drug Use, Sexual Risk
Gene expression and cancer classification
Titanium Alloys Microstructure and Properties
Healthcare Policy and Management
Computational Drug Discovery Methods
Bioinformatics and Genomic Networks
Diabetes Management and Research
Adversarial Robustness in Machine Learning
Pharmacovigilance and Adverse Drug Reactions
Biomedical Text Mining and Ontologies
Hepatitis C virus research
HIV/AIDS Research and Interventions

Harvard University
2019-2025

Harvard University Press
2022-2025

University of Miami
2013-2024

Soochow University
2024

The First People's Hospital of Guiyang
2021-2024

Chongqing Technology and Business University
2024

University of South Florida
2024

Beijing Institute of Aeronautical Materials
2012-2023

South China University of Technology
2022-2023

Xian Yang Central Hospital
2021-2023

Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects

OPENALEX - Publications

Larry Han Jue Hou Kelly Cho Rui Duan Tianxi Cai

10.1080/01621459.2025.2453249 article EN Journal of the American Statistical Association 2025-01-21

Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm

OPENALEX - Publications

Rui Duan Mary Regina Boland Zixuan Liu Yue Liu Howard H. Chang and 9 more

Abstract Objectives We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. Materials and Methods ODAL effectively utilizes the information from local site (where patient-level data are accessible) incorporates first-order (ODAL1) second-order (ODAL2) gradients of likelihood function other sites construct an estimator without requiring iterative communication or transferring data. evaluated via extensive simulation...

10.1093/jamia/ocz199 article EN Journal of the American Medical Informatics Association 2019-10-23

Learning from local to global: An efficient distributed algorithm for modeling time-to-event data

OPENALEX - Publications

Rui Duan Chongliang Luo Martijn J. Schuemie Jiayi Tong C Jason Liang and 11 more

Abstract Objective We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials Methods Using data from single site combined with only aggregated other sites, we constructed surrogate likelihood function, approximating the partial function obtained using all By maximizing each local estimate of parameter, ODAC estimator was as weighted average...

10.1093/jamia/ocaa044 article EN Journal of the American Medical Informatics Association 2020-03-28

Heterogeneity-aware and communication-efficient distributed statistical inference

OPENALEX - Publications

Rui Duan Yang Ning Yong Chen

Summary In multicentre research, individual-level data are often protected against sharing across sites. To overcome the barrier of sharing, many distributed algorithms, which only require aggregated information, have been developed. The existing algorithms usually assume homogeneously This assumption ignores important fact that collected at different sites may come from various subpopulations and environments, can lead to heterogeneity in distribution data. Ignoring erroneous statistical...

10.1093/biomet/asab007 article EN Biometrika 2021-02-11

Targeting underrepresented populations in precision medicine: A federated transfer learning approach

OPENALEX - Publications

Sai Li Tianxi Cai Rui Duan

The limited representation of minorities and disadvantaged populations in large-scale clinical genomics research poses a significant barrier to translating precision medicine into practice. Prediction models are likely underperform underrepresented due heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, two-way data integration method that leverages federated transfer learning approach integrate heterogeneous from diverse...

10.1214/23-aoas1747 article EN The Annals of Applied Statistics 2023-10-31

Origami plot: a novel multivariate data visualization tool that improves radar chart

OPENALEX - Publications

Rui Duan Jiayi Tong Alex J. Sutton David A. Asch Haitao Chu and 2 more

10.1016/j.jclinepi.2023.02.020 article EN Journal of Clinical Epidemiology 2023-02-22

Data fusion using weakly aligned sources

OPENALEX - Publications

Sijia Li Peter B. Gilbert Rui Duan Alex Luedtke

10.1080/01621459.2025.2476780 article EN Journal of the American Statistical Association 2025-03-13

SurvMaximin: Robust federated approach to transporting survival risk prediction models

OPENALEX - Publications

Xuan Wang Harrison G. Zhang Xin Xiong Chuan Hong Griffin M. Weber and 53 more

10.1016/j.jbi.2022.104176 article EN publisher-specific-oa Journal of Biomedical Informatics 2022-08-23

A novel grey prediction model with four-parameter and its application to forecast natural gas production in China

OPENALEX - Publications

Nannan Song Shuliang Li Bo Zeng Rui Duan Yingjie Yang

10.1016/j.engappai.2024.108431 article EN Engineering Applications of Artificial Intelligence 2024-04-25

Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores

OPENALEX - Publications

Chenyin Gao Justin D. Tubbs Yi Han Min Guo Sijia Li and 5 more

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, het-erogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive accuracy genetic assessments,...

10.1101/2025.01.06.25320058 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2025-01-06

Adaptive Transfer Learning for Time-to-Event Modeling with Applications in Disease Risk Assessment

OPENALEX - Publications

Yuying Lu Tian Gu Rui Duan

Abstract Objective To address the challenges in for modeling time-to-event outcomes small-sample settings by leveraging transfer learning techniques while accounting potential covariate and concept shifts between source target datasets. Methods We propose a novel approach, termed CoxTL, data based on widely used Cox proportional hazards model. CoxTL utilizes combination of density ratio weighting importance to multi-level heterogeneity, including coefficient Additionally, it accounts model...

10.1101/2025.01.14.25320536 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2025-01-15

U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms

OPENALEX - Publications

Rui Duan

Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These allow practitioners to integrate them into their own contexts, reducing need extensive data labeling, training, calibration. However, selecting best model a specific target population remains challenging due issues like limited transferability, heterogeneity, difficulty obtaining true labels or outcomes in real-world settings. In...

10.48550/arxiv.2501.18084 preprint EN arXiv (Cornell University) 2025-01-29

Transdiagnostic Polygenic Risk Models for Psychopathology and Comorbidity: Cross-Ancestry Analysis in the All of Us Research Program

OPENALEX - Publications

Phil H. Lee Jae-Yoon Jung Brandon T. Sanzo Rui Duan Irwin D. Waldman and 5 more

Psychiatric disorders exhibit substantial genetic overlap, raising questions about the utility of transdiagnostic risk models. Using data from All Us Research Program (N=102,091), we evaluated common psychiatric (CPG) factor-based polygenic scores (PRSs) compared to standard disorder-specific PRSs. The CPG PRS consistently outperformed in predicting individual disorder risk, explaining 1.07 24.6 times more phenotypic variance across 11 conditions. Meanwhile, many PRSs retained independent...

10.1101/2025.03.26.25324720 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2025-03-27

Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores

OPENALEX - Publications

Rui Duan Chenyin Gao Justin D. Tubbs Yi Han Min Guo and 5 more

<title>Abstract</title> The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, heterogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive...

10.21203/rs.3.rs-5976048/v1 preprint EN 2025-04-01

ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites

OPENALEX - Publications

Rui Duan Mary Regina Boland Jason H. Moore Yong Chen

10.1142/9789813279827_0004 article EN Biocomputing 2018-11-01

Hospitalizations Associated With Mental Health Conditions Among Adolescents in the US and France During the COVID-19 Pandemic

OPENALEX - Publications

Alba Gutiérrez‐Sacristán Arnaud Serret-Larmande Meghan R. Hutch Carlos Sáez Bruce J. Aronow and 95 more

Importance The COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of increase, particularly for severe cases requiring hospitalization, not well characterized. Large-scale federated informatics approaches provide ability to efficiently and securely query care data sets assess monitor hospitalization patterns conditions adolescents. Objective To estimate changes proportion hospitalizations adolescents following onset...

10.1001/jamanetworkopen.2022.46548 article EN cc-by-nc-nd JAMA Network Open 2022-12-13

DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models

OPENALEX - Publications

Chongliang Luo Md. Nazmul Islam Natalie E. Sheils John Buresh Jenna Reps and 23 more

Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations protecting patients' privacy, sensitive individual patient (IPD) typically cannot be shared across sites. We propose an algorithm fitting distributed linear (DLMMs) without sharing IPD This achieves results identical those achieved using pooled from multiple sites (i.e., the same effect size and standard error...

10.1038/s41467-022-29160-4 article EN cc-by Nature Communications 2022-03-30

Characterization of long COVID temporal sub-phenotypes by distributed representation learning from electronic health record data: a cohort study

OPENALEX - Publications

Arianna Dagliati Zachary H. Strasser Zahra Shakeri Hossein Abad Jeffrey G. Klann Kavishwar B. Wagholikar and 95 more

Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or

10.1016/j.eclinm.2023.102210 article EN cc-by EClinicalMedicine 2023-09-15

A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data

OPENALEX - Publications

Zhiyu Yan Kori S. Zachrison Lee H. Schwamm Juan Estrada Rui Duan

Large collaborative research networks provide opportunities to jointly analyze multicenter electronic health record (EHR) data, which can improve the sample size, diversity of study population, and generalizability results. However, there are challenges analyzing EHR data including privacy protection, large-scale computation resource requirements, heterogeneity across sites, correlated observations. In this paper, we propose a federated algorithm for generalized linear mixed models...

10.1371/journal.pone.0280192 article EN cc-by PLoS ONE 2023-01-17

Coming Soon ...