NFDI4DS | UHH-SEMS - Publication Details

Variance aware reward smoothing for deep reinforcement learning

Smoothing

DOI: 10.1016/j.neucom.2021.06.014 Publication Date: 2021-06-08T07:17:55Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

Yunlong Dong

Shengjun Zhang

Xing Liu

Yu Zhang

Tan Shen

ABSTRACT

Abstract A Reinforcement Learning (RL) agent interacts with the environment to learn a policy with high accumulated rewards through attempts and failures. However, RL suffers from its own trial-and-error learning nature, which results in an unstable learning process. In this paper, we investigate a common phenomenon called rewards drop at the late-stage RL training session, where the rewards trajectory oscillates dramatically. In order to solve such a problem, we propose a novel rewards shaping technique named Variance Aware Rewards Smoothing (VAR). We show that the proposed method reduces the variance of rewards and mitigates the rewards drop problem without changing the formulation of the value function. Furthermore, the theoretical analysis of convergence of VAR is provided, which is derived from the γ -contraction operator and the fixed point attribute of the value function. Finally, the theoretical results are illustrated by extensive results on various benchmarks and advanced algorithms across different random seeds to demonstrate the effectiveness and the compatibility of VAR.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (38)

CITATIONS (17)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications OPENALEX - Publications

PlumX Metrics

Variance aware reward smoothing for deep reinforcement learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....