NFDI4DS | UHH-SEMS - Publication Details

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Sublinear function Time horizon Zero (linguistics)

DOI: 10.1609/aaai.v36i4.20302 Publication Date: 2022-07-04T11:06:24Z

Abstract Supplemental Material References Cited by

AUTHORS (3)

Honghao Wei

Xin Liu

Lei Ying

ABSTRACT

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering horizon K, which is sufficiently large, the proposed achieves sublinear regret and zero constraint violation. The bounds depend on number of states S, actions A, two constants are independent K.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (6)

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....