Rui Duan

ORCID: 0000-0002-9261-4864
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Machine Learning in Healthcare
  • Advanced Causal Inference Techniques
  • Statistical Methods and Bayesian Inference
  • Statistical Methods in Clinical Trials
  • Genetic Associations and Epidemiology
  • Meta-analysis and systematic reviews
  • Privacy-Preserving Technologies in Data
  • Opioid Use Disorder Treatment
  • Health Systems, Economic Evaluations, Quality of Life
  • Substance Abuse Treatment and Outcomes
  • Pharmaceutical Economics and Policy
  • COVID-19 Clinical Research Studies
  • HIV, Drug Use, Sexual Risk
  • Gene expression and cancer classification
  • Titanium Alloys Microstructure and Properties
  • Healthcare Policy and Management
  • Computational Drug Discovery Methods
  • Bioinformatics and Genomic Networks
  • Diabetes Management and Research
  • Adversarial Robustness in Machine Learning
  • Pharmacovigilance and Adverse Drug Reactions
  • Biomedical Text Mining and Ontologies
  • Hepatitis C virus research
  • HIV/AIDS Research and Interventions

Harvard University
2019-2025

Harvard University Press
2022-2025

University of Miami
2013-2024

Soochow University
2024

The First People's Hospital of Guiyang
2021-2024

Chongqing Technology and Business University
2024

University of South Florida
2024

Beijing Institute of Aeronautical Materials
2012-2023

South China University of Technology
2022-2023

Xian Yang Central Hospital
2021-2023

10.1080/01621459.2025.2453249 article EN Journal of the American Statistical Association 2025-01-21

Abstract Objectives We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. Materials and Methods ODAL effectively utilizes the information from local site (where patient-level data are accessible) incorporates first-order (ODAL1) second-order (ODAL2) gradients of likelihood function other sites construct an estimator without requiring iterative communication or transferring data. evaluated via extensive simulation...

10.1093/jamia/ocz199 article EN Journal of the American Medical Informatics Association 2019-10-23

Abstract Objective We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials Methods Using data from single site combined with only aggregated other sites, we constructed surrogate likelihood function, approximating the partial function obtained using all By maximizing each local estimate of parameter, ODAC estimator was as weighted average...

10.1093/jamia/ocaa044 article EN Journal of the American Medical Informatics Association 2020-03-28

Summary In multicentre research, individual-level data are often protected against sharing across sites. To overcome the barrier of sharing, many distributed algorithms, which only require aggregated information, have been developed. The existing algorithms usually assume homogeneously This assumption ignores important fact that collected at different sites may come from various subpopulations and environments, can lead to heterogeneity in distribution data. Ignoring erroneous statistical...

10.1093/biomet/asab007 article EN Biometrika 2021-02-11

The limited representation of minorities and disadvantaged populations in large-scale clinical genomics research poses a significant barrier to translating precision medicine into practice. Prediction models are likely underperform underrepresented due heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, two-way data integration method that leverages federated transfer learning approach integrate heterogeneous from diverse...

10.1214/23-aoas1747 article EN The Annals of Applied Statistics 2023-10-31

10.1080/01621459.2025.2476780 article EN Journal of the American Statistical Association 2025-03-13

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, het-erogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive accuracy genetic assessments,...

10.1101/2025.01.06.25320058 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2025-01-06

Abstract Objective To address the challenges in for modeling time-to-event outcomes small-sample settings by leveraging transfer learning techniques while accounting potential covariate and concept shifts between source target datasets. Methods We propose a novel approach, termed CoxTL, data based on widely used Cox proportional hazards model. CoxTL utilizes combination of density ratio weighting importance to multi-level heterogeneity, including coefficient Additionally, it accounts model...

10.1101/2025.01.14.25320536 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2025-01-15

Across various domains, the growing advocacy for open science and open-source machine learning has made an increasing number of models publicly available. These allow practitioners to integrate them into their own contexts, reducing need extensive data labeling, training, calibration. However, selecting best model a specific target population remains challenging due issues like limited transferability, heterogeneity, difficulty obtaining true labels or outcomes in real-world settings. In...

10.48550/arxiv.2501.18084 preprint EN arXiv (Cornell University) 2025-01-29

Psychiatric disorders exhibit substantial genetic overlap, raising questions about the utility of transdiagnostic risk models. Using data from All Us Research Program (N=102,091), we evaluated common psychiatric (CPG) factor-based polygenic scores (PRSs) compared to standard disorder-specific PRSs. The CPG PRS consistently outperformed in predicting individual disorder risk, explaining 1.07 24.6 times more phenotypic variance across 11 conditions. Meanwhile, many PRSs retained independent...

10.1101/2025.03.26.25324720 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2025-03-27

<title>Abstract</title> The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, heterogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive...

10.21203/rs.3.rs-5976048/v1 preprint EN 2025-04-01
Alba Gutiérrez‐Sacristán Arnaud Serret-Larmande Meghan R. Hutch Carlos Sáez Bruce J. Aronow and 95 more Surbhi Bhatnagar Clara-Lea Bonzel Tianxi Cai Batsal Devkota David A. Hanauer Ne Hooi Will Loh Yuan Luo Bertrand Moal Taha Mohseni Ahooyi Wanjikũ Njoroge Gilbert S. Omenn L. Nelson Sanchez‐Pinto Andrew M. South Francesca Sperotto Amelia L.M. Tan Deanne M. Taylor Guillaume Verdy Shyam Visweswaran Zongqi Xia Janet Zahner Paul Avillach Florence T. Bourgeois James R. Aaron Giuseppe Agapito Adem Albayrak Giuseppe Albi M Alessiani Anna Alloni Danilo F. Amendola François Angoulvant Li L.L.J. Anthony Fatima Ashraf Andrew M. Atz Paula S Azevedo James Balshi Brett K. Beaulieu‐Jones Douglas S. Bell Antonio Bellasi Riccardo Bellazzi Vincent Benoît Michele Beraghi José Luis Bernal-Sobrino Mélodie Bernaux Romain Bey Alvar Blanco-Martínez Martin Boeker John Booth Silvano Bosari Robert L. Bradford Gabriel A. Brat Stéphane Breant Nicholas W. Brown Raffaele Bruno William Bryant Mauro Bucalo Emily Bucholz Anita Burgun Mario Cannataro Aldo Carmona Charlotte Caucheteux Julien Champ Jin Chen Krista Y. Chen Luca Chiovato Lorenzo Chiudinelli Kelly Cho James J. Cimino Tiago K. Colicchio Sylvie Cormont Sébastien Cossin Jean B. Craig Juan Luis Cruz Bermúdez Jaime Cruz‐Rojo Arianna Dagliati Mohamad Daniar Christel Daniel Priyam Das Audrey Dionne Rui Duan Julien Dubiel Scott L. DuVall Loïc Estève Hossein Estiri Shirley Fan Robert W Follett Thomas Ganslandt Noelia García Barrio Lana X. Garmire Nils Gehlenborg Emily Getzen Alon Geva Tomás González González Tobias Gradinger Alexandre Gramfort Romain Griffier

Importance The COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of increase, particularly for severe cases requiring hospitalization, not well characterized. Large-scale federated informatics approaches provide ability to efficiently and securely query care data sets assess monitor hospitalization patterns conditions adolescents. Objective To estimate changes proportion hospitalizations adolescents following onset...

10.1001/jamanetworkopen.2022.46548 article EN cc-by-nc-nd JAMA Network Open 2022-12-13

Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations protecting patients' privacy, sensitive individual patient (IPD) typically cannot be shared across sites. We propose an algorithm fitting distributed linear (DLMMs) without sharing IPD This achieves results identical those achieved using pooled from multiple sites (i.e., the same effect size and standard error...

10.1038/s41467-022-29160-4 article EN cc-by Nature Communications 2022-03-30
Arianna Dagliati Zachary H. Strasser Zahra Shakeri Hossein Abad Jeffrey G. Klann Kavishwar B. Wagholikar and 95 more Rebecca Mesa Shyam Visweswaran Michele Morris Yuan Luo Darren W. Henderson Malarkodi Jebathilagam Samayamuthu Bryce W. Q. Tan Guillame Verdy Gilbert S. Omenn Zongqi Xia Riccardo Bellazzi James R. Aaron Giuseppe Agapito Adem Albayrak Giuseppe Albi M Alessiani Anna Alloni Danilo F. Amendola François Angoulvant Li L.L.J. Anthony Bruce J. Aronow Fatima Ashraf Andrew M. Atz Paul Avillach Paula S. Azevedo James Balshi Brett K. Beaulieu‐Jones Douglas S. Bell Antonio Bellasi Riccardo Bellazzi Vincent Benoît Michele Beraghi José Luis Bernal-Sobrino Mélodie Bernaux Romain Bey Surbhi Bhatnagar Alvar Blanco-Martínez Clara-Lea Bonzel John Booth Silvano Bosari Florence T. Bourgeois Robert L. Bradford Gabriel A. Brat Stéphane Breant Nicholas W. Brown Raffaele Bruno William Bryant Mauro Bucalo Emily Bucholz Anita Burgun Tianxi Cai Mario Cannataro Aldo Carmona Charlotte Caucheteux Julien Champ Jin Chen Krista Y. Chen Luca Chiovato Lorenzo Chiudinelli Kelly Cho James J. Cimino Tiago K. Colicchio Sylvie Cormont Sébastien Cossin Jean B. Craig Juan Luis Cruz Bermúdez Jaime Cruz‐Rojo Arianna Dagliati Mohamad Daniar Christel Daniel Priyam Das Batsal Devkota Audrey Dionne Rui Duan Julien Dubiel Scott L. DuVall Loïc Estève Hossein Estiri Shirley Fan Robert W. Follett Thomas Ganslandt Noelia García Barrio Lana X. Garmire Nils Gehlenborg Emily Getzen Alon Geva Tobias Gradinger Alexandre Gramfort Romain Griffier Nicolas Griffon Olivier Grisel Alba Gutiérrez‐Sacristán Larry Han David A. Hanauer Christian Haverkamp

Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or

10.1016/j.eclinm.2023.102210 article EN cc-by EClinicalMedicine 2023-09-15

Large collaborative research networks provide opportunities to jointly analyze multicenter electronic health record (EHR) data, which can improve the sample size, diversity of study population, and generalizability results. However, there are challenges analyzing EHR data including privacy protection, large-scale computation resource requirements, heterogeneity across sites, correlated observations. In this paper, we propose a federated algorithm for generalized linear mixed models...

10.1371/journal.pone.0280192 article EN cc-by PLoS ONE 2023-01-17
Coming Soon ...