Diffusion Approximations for Thompson Sampling
FOS: Computer and information sciences
Computer Science - Machine Learning
FOS: Mathematics
Mathematics - Statistics Theory
Statistics Theory (math.ST)
0101 mathematics
01 natural sciences
62B15, 60J70
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2105.09232
Publication Date:
2021-01-01
AUTHORS (2)
ABSTRACT
We study the behavior of Thompson sampling from perspective weak convergence. In regime where gaps between arm means scale as $1/\sqrt{n}$ with time horizon $n$, we show that dynamics evolve according to discrete versions SDE's and stochastic ODE's. As $n \to \infty$, converge weakly solutions corresponding Our convergence theory is developed first principles using Continuous Mapping Theorem, can be easily adapted analyze other sampling-based bandit algorithms. this regime, also limits many algorithms -- including designed for any exponential family rewards, involving bootstrap-based coincide those Gaussian sampling. Moreover, in these are generally robust model mis-specification.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....