MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

Dropout (neural networks) Operator (biology)
DOI: 10.48550/arxiv.2109.10552 Publication Date: 2021-01-01
ABSTRACT
During the training of a reinforcement learning (RL) agent, distribution data is non-stationary as agent's behavior changes over time. Therefore, there risk that agent overspecialized to particular and its performance suffers in larger picture. Ensemble RL can mitigate this issue by robust policy. However, it from heavy computational resource consumption due newly introduced value policy functions. In paper, avoid notorious resources issue, we design novel simple ensemble deep framework integrates multiple models into single model. Specifically, propose \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient (MEPG), which introduces minimalist consistent Bellman update utilizing modified dropout operator. MEPG holds property keeping consistency both sides equation. Additionally, operator also increases MEPG's generalization capability. Moreover, theoretically show evaluation phase maintains two synchronized Gaussian Processes. To verify framework's ability generalize, perform experiments on gym simulator, presents outperforms or achieves similar level current state-of-the-art methods model-free without increasing additional costs.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....