Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

Performance metric
DOI: 10.1111/poms.13252 Publication Date: 2020-08-04T12:04:24Z
ABSTRACT
This study investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long‐run average metric considering both mean and variance rewards together. Such performance is important since indicates returns risk or fairness. However, couples at all stages, traditional dynamic programming inapplicable as principle consistency fails. We this from new perspective called sensitivity‐based theory. A difference formula derived it can quantify mean‐variance combined metrics MDPs under any two different policies. The be utilized to generate policies strictly improved performance. necessary condition optimal policy optimality deterministic are derived. further develop iterative algorithm form iteration, which proved converge local optima in mixed randomized space. Specially, when reward constant policies, guaranteed global optimum. Finally, we apply our approach fluctuation reduction wind power energy storage system, demonstrates potential applicability method.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (64)
CITATIONS (21)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....