Efficient Reinforcement Learning in Probabilistic Reward Machines
DOI:
10.1609/aaai.v39i18.34061
Publication Date:
2025-04-11T12:52:53Z
AUTHORS (2)
ABSTRACT
In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found robotics tasks. We design an algorithm for PRMs that achieves regret bound Õ((HOAT)^(1/2) + H²O²A^(3/2) H(T)^(1/2)), where H is the time horizon, O number observations, A actions, and T steps. This result improves over best-known bound, Õ(H(OAT)^(1/2)), MDPs Deterministic (DRMs), special case PRMs. When ≥ H³O³A² OA H, our leads to Õ((HOAT)^(1/2)), which matches established lower Ω((HOAT)^(1/2)) DRMs up logarithmic factor. To best knowledge, first efficient Additionally, present new simulation lemma rewards, enables reward-free exploration any given access approximate planner. Complementing theoretical findings, show through extensive experimental evaluations indeed outperforms prior methods various PRM environments.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....