NFDI4DS | UHH-SEMS - Publication Details

Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)

Maximization

DOI: 10.48550/arxiv.2002.02829 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Zhimin Hou

Kuangen Zhang

Yi Wan

Dongyu Li

Chenglong Fu

Haoyong Yu

ABSTRACT

The optimal policy of a reinforcement learning problem is often discontinuous and non-smooth. I.e., for two states with similar representations, their policies can be significantly different. In this case, representing the entire function approximator (FA) shared parameters all maybe not desirable, as generalization ability sharing makes discontinuous, non-smooth difficult. A common way to solve problem, known Mixture-of-Experts, represent weighted sum multiple components, where different components perform well on parts state space. Following idea inspired by recent work called advantage-weighted information maximization, we propose learn each weights these so that they entail itself also preferred action learned far state. preference characterized via advantage function. weight component would only large certain groups whose representations are similar. Therefore easy represented. We call parameterized in an Advantage Weighted Mixture Policy (AWMP) apply improve soft-actor-critic (SAC), one most competitive continuous control algorithm. Experimental results demonstrate SAC AWMP clearly outperforms four commonly used tasks achieve stable performance across random seeds.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-Critic with Advantage Weighted Mixture Policy(SAC-AWMP)

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....