NFDI4DS | UHH-SEMS - Publication Details

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

FOS: Computer and information sciences Computer Science - Machine Learning Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2502.10138 Publication Date: 2025-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Kitamura, Toshinori

Ghosh, Arnob

Kozuno, Tadashi

Kumagai, Wataru

Kasaura, Kazumi

Hoshino, Kenta

Hosoe, Yohei

Matsuo, Yutaka

ABSTRACT

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves $\tilde{\mathcal{O}}(\sqrt{K})$ regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....