NFDI4DS | UHH-SEMS - Publication Details

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2407.03300 Publication Date: 2024-07-03

Abstract Supplemental Material References Cited by

AUTHORS (5)

Yilun Xu

Gabriele Corso

Tommi Jaakkola

Arash Vahdat

Karsten Kreis

ABSTRACT

Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into simple Gaussian distribution. However, encoding complex, potentially multimodal distribution single continuous arguably represents an unnecessarily challenging learning problem. We propose Discrete-Continuous Latent Variable Models (DisCo-Diff) simplify this task by introducing complementary discrete latent variables. augment DMs with learnable latents, inferred encoder, and train DM encoder end-to-end. DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable. The latents significantly DM's complex noise-to-data mapping reducing curvature of ODE. An additional autoregressive transformer step because requires only few variables small codebooks. validate toy data, several image synthesis tasks as well molecular docking, find that consistently improves model performance. For example, achieves state-of-the-art FID scores class-conditioned ImageNet-64/128 datasets ODE sampler.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....