NFDI4DS | UHH-SEMS - Publication Details

Abbas Abdolmaleki

ORCID: 0000-0001-6692-5856

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5007133617

Research Areas

Reinforcement Learning in Robotics
Robot Manipulation and Learning
Robotic Locomotion and Control
Advanced Multi-Objective Optimization Algorithms
Machine Learning and Algorithms
Adaptive Dynamic Programming Control
Domain Adaptation and Few-Shot Learning
Adversarial Robustness in Machine Learning
Advanced Bandit Algorithms Research
Robotic Path Planning Algorithms
Evolutionary Algorithms and Applications
Prosthetics and Rehabilitation Robotics
Optimization and Search Problems
Metaheuristic Optimization Algorithms Research
Explainable Artificial Intelligence (XAI)
Advanced Control Systems Optimization
Model Reduction and Neural Networks
Viral Infectious Diseases and Gene Expression in Insects
Innovations in Concrete and Construction Materials
Fuel Cells and Related Materials
Winter Sports Injuries and Performance
Human Pose and Action Recognition
Neural Networks and Applications
Auction Theory and Applications
Stochastic Gradient Optimization Techniques

Google (United Kingdom)
2019-2024

DeepMind (United Kingdom)
2019-2024

Tarbiat Modares University
2023

Google (United States)
2019-2021

Corvallis Environmental Center
2020

University of Aveiro
2011-2019

Universidade do Porto
2015-2017

University of Minho
2015-2017

University of Isfahan
2011-2012

Qazvin Islamic Azad University
2009

DeepMind Control Suite

OPENALEX - Publications

Yuval Tassa Yotam Doron Alistair Muldal Tom Erez Yazhe Li and 7 more

The DeepMind Control Suite is a set of continuous control tasks with standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. are written in Python powered by the MuJoCo physics engine, making them easy use modify. We include several algorithms. publicly available at https://www.github.com/deepmind/dm_control . A video summary all http://youtu.be/rAai4QzcYbs

10.48550/arxiv.1801.00690 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Magnetic control of tokamak plasmas through deep reinforcement learning

OPENALEX - Publications

Jonas Degrave F. Felici Jonas Buchli Michael Neunert Brendan Tracey and 26 more

Abstract Nuclear fusion using magnetic confinement, in particular the tokamak configuration, is a promising path towards sustainable energy. A core challenge to shape and maintain high-temperature plasma within vessel. This requires high-dimensional, high-frequency, closed-loop control actuator coils, further complicated by diverse requirements across wide range of configurations. In this work, we introduce previously undescribed architecture for controller design that autonomously learns...

10.1038/s41586-021-04301-9 article EN cc-by Nature 2022-02-16

Maximum a Posteriori Policy Optimisation

OPENALEX - Publications

Abbas Abdolmaleki Jost Tobias Springenberg Yuval Tassa Rémi Munos Nicolas Heess and 1 more

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent relative entropy objective. show that several existing methods can directly be related to our derivation. develop two off-policy algorithms and demonstrate they are competitive with the state-of-the-art in deep learning. In particular, continuous control, method outperforms respect sample efficiency, premature convergence robustness hyperparameter settings...

10.48550/arxiv.1806.06920 preprint EN other-oa arXiv (Cornell University) 2018-01-01

From motor control to team play in simulated humanoid football

OPENALEX - Publications

Siqi Liu Guy Lever Zhe Wang Josh Merel S. M. Ali Eslami and 17 more

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent in physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed instantaneous muscle tensions or torques, they must be selected serve goals that defined on much longer time scales often involve complex interactions environment other Recent research has...

10.1126/scirobotics.abo0235 article EN Science Robotics 2022-08-31

Model-Based Relative Entropy Stochastic Search

OPENALEX - Publications

Abbas Abdolmaleki Rodulf Lioutikov Nuno Lua Luís Paulo Reis Jan Peters and 1 more

No abstract available.

10.1145/2908961.2930952 article FR 2016-07-20

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

OPENALEX - Publications

Noah Siegel Jost Tobias Springenberg Felix Berkenkamp Abbas Abdolmaleki Michael Neunert and 4 more

Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can acquired. This property makes these appealing for real world problems such as robot control. In practice, however, standard off-policy fail the batch setting continuous this paper, we propose simple solution problem. It admits use data generated by arbitrary behavior policies uses learned prior --...

10.48550/arxiv.2002.08396 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Relative Entropy Regularized Policy Iteration

OPENALEX - Publications

Abbas Abdolmaleki Jost Tobias Springenberg Jonas Degrave Steven Bohez Yuval Tassa and 3 more

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating parametric function; ii) improvement the estimation local non-parametric policy; and iii) generalization fitting policy. Each step can be implemented in different ways, giving rise to several variants. Our draws...

10.48550/arxiv.1812.02256 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Acme: A Research Framework for Distributed Reinforcement Learning

OPENALEX - Publications

Matthew W. Hoffman Bobak Shahriari John Aslanides Gabriel Barth-Maron Nikola Momchev and 34 more

Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in underlying architectures being trained as well complexity RL algorithms used train them. These increases turn made it more difficult for researchers rapidly prototype new ideas or reproduce published algorithms. To address concerns this work describes Acme, a framework constructing novel that is specifically designed enable agents...

10.48550/arxiv.2006.00979 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Robust Reinforcement Learning for Continuous Control with Model Misspecification

OPENALEX - Publications

Daniel J. Mankowitz Nir Levine Rae Jeong Yuanyuan Shi Jackie Kay and 5 more

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer as model misspecification into continuous control Reinforcement Learning (RL) algorithms. specifically focus on state-of-the-art RL algorithm called Maximum a-posteriori Policy Optimization (MPO). achieve this by learning policy that optimizes worst case expected return objective and derive corresponding robust entropy-regularized Bellman contraction operator. In addition,...

10.48550/arxiv.1906.07516 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Design, multi-aspect analyses, and multi-objective optimization of a biomass/geothermal-based cogeneration of power and freshwater

OPENALEX - Publications

Ali Nemati Mofarrah Meysam Jalalvand Abbas Abdolmaleki

10.1016/j.energy.2023.128369 article EN Energy 2023-07-08

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

OPENALEX - Publications

Hao Song Abbas Abdolmaleki Jost Tobias Springenberg Aidan Clark Hubert Soyer and 9 more

Some of the most successful applications deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods on-policy setting. However, gradients can suffer from large variance that may limit performance, practice require carefully tuned entropy regularization prevent collapse. As an alternative algorithms, we introduce V-MPO, adaptation Maximum a Posteriori Policy Optimization (MPO) performs iteration based on learned state-value...

10.48550/arxiv.1909.12238 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Value constrained model-free continuous control

OPENALEX - Publications

Steven Bohez Abbas Abdolmaleki Michael Neunert Jonas Buchli Nicolas Heess and 1 more

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation often results in policies which rely on high-amplitude, high-frequency signals, known colloquially bang-bang control. Although solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang lead increased wear tear or energy consumption, tends excite undesired second-order dynamics. To counteract this issue, multi-objective...

10.48550/arxiv.1902.04623 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

OPENALEX - Publications

Michael Neunert Abbas Abdolmaleki Markus Wulfmeier Thomas Lampe Jost Tobias Springenberg and 5 more

Many real-world control problems involve both discrete decision variables - such as the choice of modes, gear switching or digital outputs well continuous velocity setpoints, gains analogue outputs. However, when defining corresponding optimal reinforcement learning problem, it is commonly approximated with fully action spaces. These simplifications aim at tailoring problem to a particular algorithm solver which may only support one type space. Alternatively, expert heuristics are used...

10.48550/arxiv.2001.00449 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Deriving and improving CMA-ES with information geometric trust regions

OPENALEX - Publications

Abbas Abdolmaleki Bob Price Nuno Lau Luís Paulo Reis Gerhard Neumann

CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without need extensive parameter tuning. The algorithm has beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates current samples as well evolution path and its invariance properties. Its update rules are composed established heuristics where theoretical foundations some these also understood. In this paper we will fully derive all...

10.1145/3071178.3071252 article EN Proceedings of the Genetic and Evolutionary Computation Conference 2017-06-30

An optimized gait generator based on fourier series towards fast and robust biped locomotion involving arms swing

OPENALEX - Publications

Nima Shafii Ali Khorsandian Abbas Abdolmaleki Bahram Jozi

Bipedal locomotion is one of the most challenging problems in control, artificial intelligence, mechanics and other related fields. In this article a model free approach with emphasis on making robot's walking more stable faster presented. regard we use particle swarm optimization (PSO) to optimize signals produced by truncated Fourier series (TFS) which control joints' angels. The role hands also considered smooth increase its robustness. For first time new method will be introduced improve...

10.1109/ical.2009.5262600 article EN 2009-08-01

Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

OPENALEX - Publications

Riad Akrour Abbas Abdolmaleki Hany Abdulsamad Jan Peters Gerhard Neumann

Many of the recent trajectory optimization algorithms alternate between linear approximation system dynamics around mean and conservative policy update. One way constraining change is by bounding Kullback-Leibler (KL) divergence successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control physical systems. However, can introduce a bias update prevent convergence to optimal policy. In this article, we propose new...

10.48550/arxiv.1606.09197 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Simultaneously Learning Vision and Feature-Based Control Policies for Real-World Ball-In-A-Cup

OPENALEX - Publications

Devin Schwab Jost Tobias Springenberg Murilo Fernandes Martins Michael Neunert Thomas Lampe and 5 more

We present a method for fast training of vision based control policies on real robots.The key idea behind our is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward be optimized but also state-space which they operate.In particular, we allow task utilize features are available at training-time.This allows learning policies, subsequently generate good data main, vision-based policies.This can seen as an extension Scheduled Auxiliary Control...

10.15607/rss.2019.xv.027 preprint EN 2019-06-22

Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer

OPENALEX - Publications

Rae Jeong Jackie Kay Francesco Romano Thomas Lampe Thomas Rothörl and 4 more

Learning robotic control policies in the real world gives rise to challenges data efficiency, safety, and controlling initial condition of system. On other hand, simulations are a useful alternative as they provide an abundant source without restrictions world. Unfortunately, often fail accurately model complex real-world phenomena. Traditional system identification techniques limited expressiveness by analytical parameters, usually not sufficient capture such In this paper we propose...

10.48550/arxiv.1910.09471 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Coming Soon ...