PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (stat.ML) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2502.08106 Publication Date: 2025-02-11
ABSTRACT
Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, address challenge. Rather than directly minimizing KL divergence between predicted ground-truth distributions, PoGDiff replaces distribution with Product Gaussians (PoG), which constructed by combining original targets conditioned neighboring text embedding. Experiments real-world datasets demonstrate that our method effectively addresses imbalance problem diffusion models, improving both generation accuracy quality.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....