Generative AI to augment the fairness of foundation models in cancer pathology diagnosis.
DOI:
10.1200/jco.2025.43.16_suppl.e23230
Publication Date:
2025-05-28T16:26:46Z
AUTHORS (11)
ABSTRACT
e23230
Background:
Pathology foundation models, a type of state-of-the-art deep learning models trained on diverse and large-scale datasets, have shown the ability to extract useful pathology patterns for cancer diagnosis. However, their reliability across different demographic groups is hindered by the limited training samples from minority populations. To address this challenge, we developed a generative AI-based approach, Fairness Denoising Diffusion Probabilistic Models (DDPM), to enhance the fairness of pathology foundation models.
Methods:
We obtained 30,664 whole-slide pathology images from The Cancer Genome Atlas (TCGA) database covering 33 cancer types. Self-reported race, sex, and age, were collected alongside the images. Fairness DDPM mitigates biases by augmenting data from minority populations via a generative diffusion model. We evaluated three pathology foundation models (Gigapath, UNI, and CHIEF) across tumor detection and genetic mutation tasks. We further incorporated Fairness DPPM in the training process and evaluated the model performance across patient populations. We assessed model fairness using accuracy difference (AccDiff), area under the receiver operating characteristic curve difference (AUCDiff), equal opportunity (EOpp), and equal balanced accuracy (EBAcc).
Results:
Fairness DPPM significantly reduced AI bias across datasets and tasks (Table 1). Specifically, fairness DPPM completely eliminated racial bias in AUCDiff and EBAcc for the CHIEF model and AUCDiff, AccDiff, and EBAcc for the UNI model. In addition, Fairness DDPM eliminated 80.0% of gender and age, bias in EBAcc for the Gigapath model. Across diagnostic tasks, fairness DPPM significantly mitigates bias in all sensitive attributes (race, sex, and age).
Conclusions:
Our study shows that the fairness DDPM effectively mitigates biases in pathology foundation models. By incorporating fairness DDPM, AI diagnostic algorithms achieved greater equity across populations, representing a pivotal step toward the global adoption of fair and reliable AI in cancer pathology diagnoses.
Comparison of fairness DPPM and conventional foundation models.
(a) CHIEF
Evaluation Metrics
Fairness DPPM
Baseline
% Mitigated
EOpp
21/41
41/230
51.22%
EBAcc
7/17
17/115
41.18%
AccDiff
8/14
14/115
57.14%
AUCDiff
5/18
18/115
27.78%
(b) UNI
Evaluation Metrics
Fairness DPPM
Baseline
% Mitigated
EOpp
23/34
34/230
67.65%
EBAcc
7/12
12/115
58.33%
AccDiff
7/13
13/115
53.85%
AUCDiff
6/16
16/115
37.50%
(c) GigaPath
Evaluation Metrics
Fairness DPPM
Baseline
% Mitigated
EOpp
8/21
21/230
38.10%
EBAcc
8/10
10/115
80.00%
AccDiff
4/10
10/115
40.00%
AUCDiff
7/14
14/115
50.00%
*Cancer types covered in this study: BRCA, LUAD, UCEC, COAD, READ, LUSC, HNSC, KIRC, LGG, SKCM, STAD, BLCA, LIHC, SARC, THYM, CESC, PAAD, KICH, CHOL, OV, THCA.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....