The Fragility of Optimized Bandit Algorithms

FOS: Computer and information sciences Computer Science - Machine Learning Mathematics - Statistics Theory Machine Learning (stat.ML) 02 engineering and technology Statistics Theory (math.ST) 01 natural sciences Machine Learning (cs.LG) Statistics - Machine Learning 0202 electrical engineering, electronic engineering, information engineering FOS: Mathematics 0101 mathematics
DOI: 10.48550/arxiv.2109.13595 Publication Date: 2024-12-30
ABSTRACT
What Does the Distribution of Regret Look Like for Bandit Algorithms? In multiarmed bandit theory, the expected regret has long been the predominant performance measure studied in the literature. For the first time, in “The fragility of optimized bandit algorithms,” Fan and Glynn study the tail of the regret distribution. For a given bandit model, it is natural to optimize to achieve the minimum possible expected regret growth rate. However, the authors show that optimizing for expected regret alone can result in severe fragility. Optimized expected regret rates come with the side effect of extremely heavy regret tails (like those of a Cauchy distribution) and thus, significant chance of catastrophically large regret. Moreover, optimized algorithms essentially operate on a knife’s edge such that the slightest degree of model misspecification in algorithm design can result in a much larger expected regret rate. Among other insights, the authors characterize sharp trade-offs between the expected regret rate and the heaviness of the regret tail. They also show how to robustify an algorithm and lighten the regret tail through increased exploration.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....