A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Sublinear function Time horizon Zero (linguistics)
DOI: 10.1609/aaai.v36i4.20302 Publication Date: 2022-07-04T11:06:24Z
ABSTRACT
This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering horizon K, which is sufficiently large, the proposed achieves sublinear regret and zero constraint violation. The bounds depend on number of states S, actions A, two constants are independent K.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (6)