A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes
Sublinear function
Time horizon
Zero (linguistics)
DOI:
10.1609/aaai.v36i4.20302
Publication Date:
2022-07-04T11:06:24Z
AUTHORS (3)
ABSTRACT
This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering horizon K, which is sufficiently large, the proposed achieves sublinear regret and zero constraint violation. The bounds depend on number of states S, actions A, two constants are independent K.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (6)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....