Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
Certainty
Value (mathematics)
DOI:
10.48550/arxiv.2301.12601
Publication Date:
2023-01-01
AUTHORS (3)
ABSTRACT
The optimized certainty equivalent (OCE) is a family of risk measures that cover important examples such as entropic risk, conditional value-at-risk and mean-variance models. In this paper, we propose new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. We design an efficient algorithm for problem value iteration upper confidence bound. derive bound the regret proposed algorithm, also establish minimax lower Our bounds show rate achieved by our has optimal dependence number episodes actions.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....