NFDI4DS | UHH-SEMS - Publication Details

MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

Dropout (neural networks) Operator (biology)

DOI: 10.48550/arxiv.2109.10552 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Qiang He

Chen Gong

Yuxun Qu

Xiaoyu Chen

Xinwen Hou

Yu Liu

ABSTRACT

During the training of a reinforcement learning (RL) agent, distribution data is non-stationary as agent's behavior changes over time. Therefore, there risk that agent overspecialized to particular and its performance suffers in larger picture. Ensemble RL can mitigate this issue by robust policy. However, it from heavy computational resource consumption due newly introduced value policy functions. In paper, avoid notorious resources issue, we design novel simple ensemble deep framework integrates multiple models into single model. Specifically, propose \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient (MEPG), which introduces minimalist consistent Bellman update utilizing modified dropout operator. MEPG holds property keeping consistency both sides equation. Additionally, operator also increases MEPG's generalization capability. Moreover, theoretically show evaluation phase maintains two synchronized Gaussian Processes. To verify framework's ability generalize, perform experiments on gym simulator, presents outperforms or achieves similar level current state-of-the-art methods model-free without increasing additional costs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....