VIME: Variational Information Maximizing Exploration

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Robotics Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning 0202 electrical engineering, electronic engineering, information engineering Machine Learning (stat.ML) 02 engineering and technology Robotics (cs.RO) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.1605.09674 Publication Date: 2016-01-01
ABSTRACT
Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees the setting of discrete state action spaces, these cannot be applied high-dimensional deep RL scenarios. As such, most contemporary relies on simple heuristics such as epsilon-greedy or adding Gaussian noise to controls. This paper introduces Variational Information Maximizing Exploration (VIME), an strategy based maximization information gain about agent's belief environment dynamics. We propose practical implementation, using variational inference Bayesian neural networks which efficiently handles continuous spaces. VIME modifies MDP reward function, can several different underlying algorithms. demonstrate that achieves significantly better performance compared heuristic across variety control tasks algorithms, including very sparse rewards.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....