AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Perfect information Time limit
DOI: 10.1609/aaai.v36i4.20394 Publication Date: 2022-07-04T10:59:42Z
ABSTRACT
Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative priorworks like DeepStack and Libratus heavily rely on counter-factual regret minimization (CFR) its variants to tackleHUNL. However, prohibitive computation cost of CFRiteration makes it difficult for subsequent researchers learnthe CFR model in HUNL apply other practical applications. In this work, we present AlphaHoldem, a high-performance lightweight AI obtained an end-to-end self-play reinforcement learning framework. The proposed framework adopts pseudo-siamese architecture directly learn from input state information output actions by competing learned different historical versions. main technical contributions include anovel representation card betting information, amultitask training loss function, new modelevaluation selection metric generate final model.In study involving 100,000 hands poker, AlphaHoldemdefeats Slumbot using only one PC threedays training. At same time, AlphaHoldem takes 2.9milliseconds each decision-making singleGPU, more than 1,000 times faster DeepStack. We release history data among Slumbot,and top human professionals author’s GitHub repository facilitate further studies direction.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (25)