Robust Reinforcement Learning for Continuous Control with Model Misspecification

Robustness
DOI: 10.48550/arxiv.1906.07516 Publication Date: 2019-01-01
ABSTRACT
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer as model misspecification into continuous control Reinforcement Learning (RL) algorithms. specifically focus on state-of-the-art RL algorithm called Maximum a-posteriori Policy Optimization (MPO). achieve this by learning policy that optimizes worst case expected return objective and derive corresponding robust entropy-regularized Bellman contraction operator. In addition, introduce less conservative, soft-robust, with show both, soft-robust policies, outperform their non-robust counterparts nine Mujoco domains environment perturbations. improved performance high-dimensional, simulated, dexterous robotic hand. Finally, present multiple investigative experiments deeper insight framework. This includes an adaptation another well uncertainty set from offline data. Performance videos can be found online at https://sites.google.com/view/robust-rl.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....