Online Posterior Sampling with a Diffusion Prior

FOS: Computer and information sciences Computer Science - Machine Learning Statistics - Machine Learning Machine Learning (stat.ML) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2410.03919 Publication Date: 2024-10-04
ABSTRACT
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior algorithms for diffusion model prior. key idea to sample from chain of conditional posteriors, one each stage reverse process, which are estimated closed form Our approximations motivated by prior, and inherit its simplicity efficiency. They asymptotically consistent perform well empirically on variety bandit problems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....