The Fragility of Optimized Bandit Algorithms
FOS: Computer and information sciences
Computer Science - Machine Learning
Mathematics - Statistics Theory
Machine Learning (stat.ML)
02 engineering and technology
Statistics Theory (math.ST)
01 natural sciences
Machine Learning (cs.LG)
Statistics - Machine Learning
0202 electrical engineering, electronic engineering, information engineering
FOS: Mathematics
0101 mathematics
DOI:
10.48550/arxiv.2109.13595
Publication Date:
2024-12-30
AUTHORS (2)
ABSTRACT
What Does the Distribution of Regret Look Like for Bandit Algorithms? In multiarmed bandit theory, the expected regret has long been the predominant performance measure studied in the literature. For the first time, in “The fragility of optimized bandit algorithms,” Fan and Glynn study the tail of the regret distribution. For a given bandit model, it is natural to optimize to achieve the minimum possible expected regret growth rate. However, the authors show that optimizing for expected regret alone can result in severe fragility. Optimized expected regret rates come with the side effect of extremely heavy regret tails (like those of a Cauchy distribution) and thus, significant chance of catastrophically large regret. Moreover, optimized algorithms essentially operate on a knife’s edge such that the slightest degree of model misspecification in algorithm design can result in a much larger expected regret rate. Among other insights, the authors characterize sharp trade-offs between the expected regret rate and the heaviness of the regret tail. They also show how to robustify an algorithm and lighten the regret tail through increased exploration.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....