- Reinforcement Learning in Robotics
- Robot Manipulation and Learning
- Robotic Locomotion and Control
- Advanced Multi-Objective Optimization Algorithms
- Machine Learning and Algorithms
- Adaptive Dynamic Programming Control
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Advanced Bandit Algorithms Research
- Robotic Path Planning Algorithms
- Evolutionary Algorithms and Applications
- Prosthetics and Rehabilitation Robotics
- Optimization and Search Problems
- Metaheuristic Optimization Algorithms Research
- Explainable Artificial Intelligence (XAI)
- Advanced Control Systems Optimization
- Model Reduction and Neural Networks
- Viral Infectious Diseases and Gene Expression in Insects
- Innovations in Concrete and Construction Materials
- Fuel Cells and Related Materials
- Winter Sports Injuries and Performance
- Human Pose and Action Recognition
- Neural Networks and Applications
- Auction Theory and Applications
- Stochastic Gradient Optimization Techniques
Google (United Kingdom)
2019-2024
DeepMind (United Kingdom)
2019-2024
Tarbiat Modares University
2023
Google (United States)
2019-2021
Corvallis Environmental Center
2020
University of Aveiro
2011-2019
Universidade do Porto
2015-2017
University of Minho
2015-2017
University of Isfahan
2011-2012
Qazvin Islamic Azad University
2009
The DeepMind Control Suite is a set of continuous control tasks with standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. are written in Python powered by the MuJoCo physics engine, making them easy use modify. We include several algorithms. publicly available at https://www.github.com/deepmind/dm_control . A video summary all http://youtu.be/rAai4QzcYbs
Abstract Nuclear fusion using magnetic confinement, in particular the tokamak configuration, is a promising path towards sustainable energy. A core challenge to shape and maintain high-temperature plasma within vessel. This requires high-dimensional, high-frequency, closed-loop control actuator coils, further complicated by diverse requirements across wide range of configurations. In this work, we introduce previously undescribed architecture for controller design that autonomously learns...
We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent relative entropy objective. show that several existing methods can directly be related to our derivation. develop two off-policy algorithms and demonstrate they are competitive with the state-of-the-art in deep learning. In particular, continuous control, method outperforms respect sample efficiency, premature convergence robustness hyperparameter settings...
Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent in physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed instantaneous muscle tensions or torques, they must be selected serve goals that defined on much longer time scales often involve complex interactions environment other Recent research has...
No abstract available.
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can acquired. This property makes these appealing for real world problems such as robot control. In practice, however, standard off-policy fail the batch setting continuous this paper, we propose simple solution problem. It admits use data generated by arbitrary behavior policies uses learned prior --...
We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating parametric function; ii) improvement the estimation local non-parametric policy; and iii) generalization fitting policy. Each step can be implemented in different ways, giving rise to several variants. Our draws...
Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in underlying architectures being trained as well complexity RL algorithms used train them. These increases turn made it more difficult for researchers rapidly prototype new ideas or reproduce published algorithms. To address concerns this work describes Acme, a framework constructing novel that is specifically designed enable agents...
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer as model misspecification into continuous control Reinforcement Learning (RL) algorithms. specifically focus on state-of-the-art RL algorithm called Maximum a-posteriori Policy Optimization (MPO). achieve this by learning policy that optimizes worst case expected return objective and derive corresponding robust entropy-regularized Bellman contraction operator. In addition,...
Some of the most successful applications deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods on-policy setting. However, gradients can suffer from large variance that may limit performance, practice require carefully tuned entropy regularization prevent collapse. As an alternative algorithms, we introduce V-MPO, adaptation Maximum a Posteriori Policy Optimization (MPO) performs iteration based on learned state-value...
The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation often results in policies which rely on high-amplitude, high-frequency signals, known colloquially bang-bang control. Although solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang lead increased wear tear or energy consumption, tends excite undesired second-order dynamics. To counteract this issue, multi-objective...
Many real-world control problems involve both discrete decision variables - such as the choice of modes, gear switching or digital outputs well continuous velocity setpoints, gains analogue outputs. However, when defining corresponding optimal reinforcement learning problem, it is commonly approximated with fully action spaces. These simplifications aim at tailoring problem to a particular algorithm solver which may only support one type space. Alternatively, expert heuristics are used...
CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without need extensive parameter tuning. The algorithm has beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates current samples as well evolution path and its invariance properties. Its update rules are composed established heuristics where theoretical foundations some these also understood. In this paper we will fully derive all...
Bipedal locomotion is one of the most challenging problems in control, artificial intelligence, mechanics and other related fields. In this article a model free approach with emphasis on making robot's walking more stable faster presented. regard we use particle swarm optimization (PSO) to optimize signals produced by truncated Fourier series (TFS) which control joints' angels. The role hands also considered smooth increase its robustness. For first time new method will be introduced improve...
Many of the recent trajectory optimization algorithms alternate between linear approximation system dynamics around mean and conservative policy update. One way constraining change is by bounding Kullback-Leibler (KL) divergence successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control physical systems. However, can introduce a bias update prevent convergence to optimal policy. In this article, we propose new...
We present a method for fast training of vision based control policies on real robots.The key idea behind our is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward be optimized but also state-space which they operate.In particular, we allow task utilize features are available at training-time.This allows learning policies, subsequently generate good data main, vision-based policies.This can seen as an extension Scheduled Auxiliary Control...
Learning robotic control policies in the real world gives rise to challenges data efficiency, safety, and controlling initial condition of system. On other hand, simulations are a useful alternative as they provide an abundant source without restrictions world. Unfortunately, often fail accurately model complex real-world phenomena. Traditional system identification techniques limited expressiveness by analytical parameters, usually not sufficient capture such In this paper we propose...