NFDI4DS | UHH-SEMS - Publication Details

Markus Wulfmeier

ORCID: 0000-0003-1802-4492

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5002747297

Research Areas

Reinforcement Learning in Robotics
Adversarial Robustness in Machine Learning
Robot Manipulation and Learning
Human Pose and Action Recognition
Domain Adaptation and Few-Shot Learning
Robotic Locomotion and Control
Robotic Path Planning Algorithms
Autonomous Vehicle Technology and Safety
Evolutionary Algorithms and Applications
Robotics and Sensor-Based Localization
Explainable Artificial Intelligence (XAI)
Machine Learning and Algorithms
Viral Infectious Diseases and Gene Expression in Insects
Multimodal Machine Learning Applications
Smart Grid Energy Management
Machine Learning and Data Classification
Model Reduction and Neural Networks
Muscle activation and electromyography studies
Neural Networks and Reservoir Computing
Generative Adversarial Networks and Image Synthesis
Neural Networks and Applications
Anomaly Detection Techniques and Applications
AI-based Problem Solving and Planning
Soil Mechanics and Vehicle Dynamics
Smart Grid Security and Resilience

Google (United Kingdom)
2024

DeepMind (United Kingdom)
2021-2024

Leibniz University Hannover
2015-2024

University College London
2023

Google (United States)
2018-2021

Corvallis Environmental Center
2020

University of Oxford
2016-2019

Science Oxford
2016-2017

Oxford Research Group
2016

Massachusetts Institute of Technology
2013

Maximum Entropy Deep Inverse Reinforcement Learning

OPENALEX - Publications

Markus Wulfmeier Peter Ondrúška Ingmar Posner

This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in context solving inverse reinforcement learning (IRL) problem. We show this that Maximum Entropy paradigm IRL lends itself naturally efficient training deep architectures. At test time, approach leads computational complexity independent number demonstrations, which makes it especially well-suited applications life-long scenarios. Our...

10.48550/arxiv.1507.04888 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Reverse Curriculum Generation for Reinforcement Learning

OPENALEX - Publications

Carlos Florensa David Held Markus Wulfmeier Michael Zhang Pieter Abbeel

Many relevant tasks require an agent to reach a certain state, or manipulate objects into desired configuration. For example, we might want robot align and assemble gear onto axle insert turn key in lock. These goal-oriented present considerable challenge for reinforcement learning, since their natural reward function is sparse prohibitive amounts of exploration are required the goal receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations...

10.48550/arxiv.1707.05300 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Learning agile soccer skills for a bipedal robot with deep reinforcement learning

OPENALEX - Publications

Tuomas Haarnoja Ben Moran Guy Lever Sandy H. Huang Dhruva Tirumala and 23 more

We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. used RL train play simplified one-versus-one soccer game. The resulting agent exhibits robust dynamic skills, such as rapid fall recovery, walking, turning, kicking, it transitions between them in smooth efficient manner. It also learned anticipate ball movements block...

10.1126/scirobotics.adi8022 article EN Science Robotics 2024-04-10

Large-scale cost function learning for path planning using deep inverse reinforcement learning

OPENALEX - Publications

Markus Wulfmeier Dushyant Rao Dominic Zeng Wang Peter Ondrúška Ingmar Posner

We present an approach for learning spatial traversability maps driving in complex, urban environments based on extensive dataset demonstrating the behaviour of human experts. The direct end-to-end mapping from raw input data to cost bypasses effort manually designing parts pipeline, exploits a large number samples, and can be framed additionally refine handcrafted produced manual hand-engineered features. To achieve this, we introduce maximum-entropy-based, non-linear inverse reinforcement...

10.1177/0278364917722396 article EN The International Journal of Robotics Research 2017-08-04

Watch this: Scalable cost-function learning for path planning in urban environments

OPENALEX - Publications

Markus Wulfmeier Dominic Zeng Wang Ingmar Posner

In this work, we present an approach to learn cost maps for driving in complex urban environments from a large number of demonstrations human behaviour. The learned are constructed directly raw sensor measurements, bypassing the effort manually designing as well features. When deploying maps, trajectories generated not only replicate human-like behaviour but also demonstrably robust against systematic errors putative robot configuration. To achieve deploy Maximum Entropy based, non-linear...

10.1109/iros.2016.7759328 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016-10-01

Incremental Adversarial Domain Adaptation for Continually Changing Environments

OPENALEX - Publications

Markus Wulfmeier Alex Bewley Ingmar Posner

Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. While unsupervised domain adaptation aims to address this challenge, current approaches do not utilise continuity occurring shifts. In particular, many robotics applications exhibit these thus facilitate potential incrementally adapt a learnt model over minor which integrate massive differences time. Our work presents an adversarial approach for...

10.1109/icra.2018.8460982 article EN 2018-05-01

From motor control to team play in simulated humanoid football

OPENALEX - Publications

Siqi Liu Guy Lever Zhe Wang Josh Merel S. M. Ali Eslami and 17 more

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent in physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed instantaneous muscle tensions or torques, they must be selected serve goals that defined on much longer time scales often involve complex interactions environment other Recent research has...

10.1126/scirobotics.abo0235 article EN Science Robotics 2022-08-31

INVESTIGATION OF STRESS AND FAILURE IN GRANULAR SOILS FOR LIGHTWEIGHT ROBOTIC VEHICLE APPLICATIONS

OPENALEX - Publications

Carmine Senatore Jamie MacLennan Paramsothy Jayakumar Markus Wulfmeier Karl Iagnemma

<title>ABSTRACT</title> <p>This paper describes novel experimental methods aimed at understanding the fundamental phenomena governing motion of lightweight vehicles on dry, granular soils. A single-wheel test rig is used to empirically investigate wheel under controlled slip and loading conditions sandy, dry soil. Test can be designed replicate typical field scenarios for robots, while key operational parameters such as drawbar force, torque, sinkage are measured. This...

10.4271/2024-01-3379 article EN SAE technical papers on CD-ROM/SAE technical paper series 2024-11-15

Addressing appearance change in outdoor robotics with adversarial domain adaptation

OPENALEX - Publications

Markus Wulfmeier Alex Bewley Ingmar Posner

Appearance changes due to weather and seasonal conditions represent a strong impediment the robust implementation of machine learning systems in outdoor robotics. While supervised optimises model for training domain, it will deliver degraded performance application domains that underlie distributional shifts caused by these changes. Traditionally, this problem has been addressed via collection labelled data multiple or imposing priors on type shift between both domains. We frame context...

10.1109/iros.2017.8205961 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01

Design and implementation of a particle image velocimetry method for analysis of running gear–soil interaction

OPENALEX - Publications

Carmine Senatore Markus Wulfmeier Ivan Vlahinić José E. Andrade Karl Iagnemma

10.1016/j.jterra.2013.09.004 article EN Journal of Terramechanics 2013-10-01

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

OPENALEX - Publications

Michael Neunert Abbas Abdolmaleki Markus Wulfmeier Thomas Lampe Jost Tobias Springenberg and 5 more

Many real-world control problems involve both discrete decision variables - such as the choice of modes, gear switching or digital outputs well continuous velocity setpoints, gains analogue outputs. However, when defining corresponding optimal reinforcement learning problem, it is commonly approximated with fully action spaces. These simplifications aim at tailoring problem to a particular algorithm solver which may only support one type space. Alternatively, expert heuristics are used...

10.48550/arxiv.2001.00449 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

OPENALEX - Publications

Steven Bohez Saran Tunyasuvunakool Philémon Brakel Fereshteh Sadeghi Leonard Hasenclever and 16 more

We investigate the use of prior knowledge human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating or dog Motion Capture (MoCap) data a skill module. Once learned, this module can be reused complex downstream tasks. Importantly, due imposed by MoCap data, our does not require extensive reward engineering produce sensible natural looking behavior at time reuse. This makes it easy create well-regularized,...

10.48550/arxiv.2203.17138 preprint EN cc-by arXiv (Cornell University) 2022-01-01

TACO: Learning Task Decomposition via Temporal Alignment for Control

OPENALEX - Publications

Kyriacos Shiarlis Markus Wulfmeier Sasha Salter Shimon Whiteson Ingmar Posner

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing corresponding sub-policies within and between tasks, they provide training data for each policy different high-level compose them to perform novel ones. Existing approaches modular LfD focus either on learning a single task or depend domain knowledge temporal segmentation. In contrast, we propose weakly supervised, domain-agnostic approach based...

10.48550/arxiv.1803.01840 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design

OPENALEX - Publications

Siliang Zeng Luca Viano Chenliang Li Jiaxiang Li Volkan Cevher and 4 more

10.36227/techrxiv.174784525.51683948/v1 preprint EN 2025-05-21

Mutual Alignment Transfer Learning

OPENALEX - Publications

Markus Wulfmeier Ingmar Posner Pieter Abbeel

Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning games simulations, research robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies simulation, such perform sub-optimally on platform given imperfect calibration model dynamics. We present an approach -- supplemental fine tuning further benefit from parallel...

10.48550/arxiv.1707.07907 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

OPENALEX - Publications

Tim Seyde Igor Gilitschenski Wilko Schwarting Bartolomeo Stellato Martin Riedmiller and 2 more

Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space. In this work, we investigate colloquially known phenomenon that trained agents often prefer actions at boundaries of We draw theoretical connections to emergence bang-bang behavior in optimal control, and provide extensive empirical evaluation across a variety recent RL algorithms. replace normal Gaussian by Bernoulli distribution solely considers extremes along...

10.48550/arxiv.2111.02552 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Compositional Transfer in Hierarchical Reinforcement Learning

OPENALEX - Publications

Markus Wulfmeier Abbas Abdolmaleki Roland Hafner Jost Tobias Springenberg Michael Neunert and 5 more

The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements.We introduce Regu larized Hierarchical Policy Optimization (RHPO) improve data-efliciency for domains with multiple dominant tasks and ultimately reduce required platform time.To this end, we employ compositional inductive biases on levels corresponding mechanisms sharing off-policy transition across low-level controllers as well...

10.15607/rss.2020.xvi.054 article EN 2020-06-30

From Motor Control to Team Play in Simulated Humanoid Football

OPENALEX - Publications

Siqi Liu Guy Lever Zhe Wang Josh Merel S. M. Ali Eslami and 17 more

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, terms relations that extend far beyond body itself, involving coordination with other agents. Recent research artificial intelligence has shown promise learning-based approaches respective problems complex movement,...

10.48550/arxiv.2105.12196 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Coming Soon ...