Policy Evaluation in Distributional LQR (Extended Version)

Representation Expected value
DOI: 10.48550/arxiv.2401.10240 Publication Date: 2024-01-01
ABSTRACT
Distributional reinforcement learning (DRL) enhances the understanding of effects randomness in environment by letting agents learn distribution a random return, rather than its expected value as standard learning. Meanwhile, challenge DRL is that policy evaluation typically relies on representation return distribution, which needs to be carefully designed. In this paper, we address for special class problems rely discounted linear quadratic regulator (LQR), call \emph{distributional LQR}. Specifically, provide closed-form expression applicable all types exogenous disturbance long it independent and identically distributed (i.i.d.). We show variance bounded if fourth moment bounded. Furthermore, investigate sensitivity model perturbations. While proposed exact consists infinitely many variables, can well approximated finite number variables. The associated approximation error analytically under mild assumptions. When unknown, propose model-free approach estimating supported sample complexity guarantees. Finally, extend our partially observable systems. Numerical experiments are provided illustrate theoretical results.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....