Learning Diverse Risk Preferences in Population-Based Self-Play
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Science - Multiagent Systems
Machine Learning (cs.LG)
Multiagent Systems (cs.MA)
DOI:
10.1609/aaai.v38i11.29188
Publication Date:
2024-03-25T11:01:44Z
AUTHORS (8)
ABSTRACT
Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current RL methods commonly optimize agent to maximize expected win-rates against its or historical copies, resulting limited strategy style and tendency get stuck local optima. To address this limitation, it is important improve diversity policies, allowing break stalemates enhance robustness when facing with different opponents. In paper, we present novel perspective promote by considering that agents could diverse risk preferences face uncertainty. achieve this, introduce reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case best-case policy learning, enabling desired preferences. Furthermore, seamlessly integrating RPPO population-based self-play, population dynamic risk-sensitive objectives using experiences gained from playing Our empirical results demonstrate our method achieves comparable superior performance games and, importantly, leads emergence behavioral modes. Code available at https://github.com/Jackory/RPBT.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....