SOAC: The Soft Option Actor-Critic Architecture

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2006.14363 Publication Date: 2020-01-01
ABSTRACT
The option framework has shown great promise by automatically extracting temporally-extended sub-tasks from a long-horizon task. Methods have been proposed for concurrently learning low-level intra-option policies and high-level selection policy. However, existing methods typically suffer two major challenges: ineffective exploration unstable updates. In this paper, we present novel stable off-policy approach that builds on the maximum entropy model to address these challenges. Our introduces an information-theoretical intrinsic reward encouraging identification of diverse effective options. Meanwhile, utilize probability inference simplify optimization problem as fitting optimal trajectories. Experimental results demonstrate our significantly outperforms prior on-policy in range Mujoco benchmark tasks while still providing benefits transfer learning. tasks, learns set options, each whose state-action space strong coherence.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....