Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Generative model Synthetic data
DOI: 10.1038/s41746-024-01076-x Publication Date: 2024-03-20T19:01:58Z
ABSTRACT
Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing often hindered by privacy regulatory concerns. Synthetic generation holds the promise of effectively bypassing these boundaries allowing for simplified accessibility prospect synthetic control cohorts. We employed two different methodologies generative artificial intelligence - CTAB-GAN+ normalizing flows (NFlow) synthesize derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both models accurately captured distributions demographic, laboratory, molecular cytogenetic variables, as well outcomes yielding high performance scores regarding fidelity usability both cohorts (n = each). Survival analysis demonstrated close resemblance survival curves between original Inter-variable relationships preserved in univariable outcome enabling explorative our data. Additionally, training sample safeguarded mitigating possible re-identification, which we quantified using Hamming distances. provide not only proof-of-concept multimodal rare diseases, but also full public foster further research.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (57)
CITATIONS (24)