DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2407.03300 Publication Date: 2024-07-03
ABSTRACT
Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into simple Gaussian distribution. However, encoding complex, potentially multimodal distribution single continuous arguably represents an unnecessarily challenging learning problem. We propose Discrete-Continuous Latent Variable Models (DisCo-Diff) simplify this task by introducing complementary discrete latent variables. augment DMs with learnable latents, inferred encoder, and train DM encoder end-to-end. DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable. The latents significantly DM's complex noise-to-data mapping reducing curvature of ODE. An additional autoregressive transformer step because requires only few variables small codebooks. validate toy data, several image synthesis tasks as well molecular docking, find that consistently improves model performance. For example, achieves state-of-the-art FID scores class-conditioned ImageNet-64/128 datasets ODE sampler.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....