NFDI4DS | UHH-SEMS - Publication Details

SOAC: The Soft Option Actor-Critic Architecture

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2006.14363 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Chenghao Li

Xiaoteng Ma

Chongjie Zhang

Jun Yang

Li Xia

Qianchuan Zhao

ABSTRACT

The option framework has shown great promise by automatically extracting temporally-extended sub-tasks from a long-horizon task. Methods have been proposed for concurrently learning low-level intra-option policies and high-level selection policy. However, existing methods typically suffer two major challenges: ineffective exploration unstable updates. In this paper, we present novel stable off-policy approach that builds on the maximum entropy model to address these challenges. Our introduces an information-theoretical intrinsic reward encouraging identification of diverse effective options. Meanwhile, utilize probability inference simplify optimization problem as fitting optimal trajectories. Experimental results demonstrate our significantly outperforms prior on-policy in range Mujoco benchmark tasks while still providing benefits transfer learning. tasks, learns set options, each whose state-action space strong coherence.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

SOAC: The Soft Option Actor-Critic Architecture

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....