Latent Space Policies for Hierarchical Reinforcement Learning

FOS: Computer and information sciences Computer Science - Machine Learning 0209 industrial biotechnology Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning Machine Learning (stat.ML) 02 engineering and technology Machine Learning (cs.LG)
DOI: 10.48550/arxiv.1804.02808 Publication Date: 2018-01-01
ABSTRACT
We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers a hierarchy force them use higher-level modulating signals, each layer in our framework is trained directly solve task, but acquires range diverse strategies via maximum entropy objective. Each also augmented with latent random variables, which are sampled from prior distribution during training layer. The objective causes these variables be incorporated into layer's policy, and higher level can control behavior through this space. Furthermore, by constraining mapping actions invertible, retain full expressivity: neither nor constrained their behavior. Our experimental evaluation demonstrates we improve on performance single-layer standard benchmark tasks simply adding additional layers, method more complex sparse-reward top high-entropy skills optimized simple low-level objectives.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....