NFDI4DS | UHH-SEMS - Publication Details

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2502.01051 Publication Date: 2025-02-02

Abstract Supplemental Material References Cited by

AUTHORS (9)

Tao Zhang

Cheng Da

Kun Ding

Kun Jin

Yan Li

Tingting Gao

Di Zhang

Shiming Xiang

Chunhong Pan

ABSTRACT

Preference optimization for diffusion models aims to align them with human preferences images. Previous methods typically leverage Vision-Language Models (VLMs) as pixel-level reward approximate preferences. However, when used step-level preference optimization, these face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we demonstrate that are inherently well-suited modeling the latent space, they can naturally extract features from Accordingly, propose Latent Reward Model (LRM), which repurposes components predict at various timesteps. Building on LRM, introduce Optimization (LPO), a method designed directly Experimental results indicate LPO not only significantly enhances performance aligning general, aesthetic, text-image alignment preferences, but also achieves 2.5-28$\times$ training speedup compared existing methods. Our code will be available https://github.com/casiatao/LPO.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....