PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Statistics - Machine Learning
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
Machine Learning (stat.ML)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2502.08106
Publication Date:
2025-02-11
AUTHORS (4)
ABSTRACT
Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, address challenge. Rather than directly minimizing KL divergence between predicted ground-truth distributions, PoGDiff replaces distribution with Product Gaussians (PoG), which constructed by combining original targets conditioned neighboring text embedding. Experiments real-world datasets demonstrate that our method effectively addresses imbalance problem diffusion models, improving both generation accuracy quality.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....