Diffusion Approximations for Thompson Sampling

FOS: Computer and information sciences Computer Science - Machine Learning FOS: Mathematics Mathematics - Statistics Theory Statistics Theory (math.ST) 0101 mathematics 01 natural sciences 62B15, 60J70 Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2105.09232 Publication Date: 2021-01-01
ABSTRACT
We study the behavior of Thompson sampling from perspective weak convergence. In regime where gaps between arm means scale as $1/\sqrt{n}$ with time horizon $n$, we show that dynamics evolve according to discrete versions SDE's and stochastic ODE's. As $n \to \infty$, converge weakly solutions corresponding Our convergence theory is developed first principles using Continuous Mapping Theorem, can be easily adapted analyze other sampling-based bandit algorithms. this regime, also limits many algorithms -- including designed for any exponential family rewards, involving bootstrap-based coincide those Gaussian sampling. Moreover, in these are generally robust model mis-specification.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....