Markus Wulfmeier

ORCID: 0000-0003-1802-4492
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Adversarial Robustness in Machine Learning
  • Robot Manipulation and Learning
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Robotic Locomotion and Control
  • Robotic Path Planning Algorithms
  • Autonomous Vehicle Technology and Safety
  • Evolutionary Algorithms and Applications
  • Robotics and Sensor-Based Localization
  • Explainable Artificial Intelligence (XAI)
  • Machine Learning and Algorithms
  • Viral Infectious Diseases and Gene Expression in Insects
  • Multimodal Machine Learning Applications
  • Smart Grid Energy Management
  • Machine Learning and Data Classification
  • Model Reduction and Neural Networks
  • Muscle activation and electromyography studies
  • Neural Networks and Reservoir Computing
  • Generative Adversarial Networks and Image Synthesis
  • Neural Networks and Applications
  • Anomaly Detection Techniques and Applications
  • AI-based Problem Solving and Planning
  • Soil Mechanics and Vehicle Dynamics
  • Smart Grid Security and Resilience

Google (United Kingdom)
2024

DeepMind (United Kingdom)
2021-2024

Leibniz University Hannover
2015-2024

University College London
2023

Google (United States)
2018-2021

Corvallis Environmental Center
2020

University of Oxford
2016-2019

Science Oxford
2016-2017

Oxford Research Group
2016

Massachusetts Institute of Technology
2013

This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in context solving inverse reinforcement learning (IRL) problem. We show this that Maximum Entropy paradigm IRL lends itself naturally efficient training deep architectures. At test time, approach leads computational complexity independent number demonstrations, which makes it especially well-suited applications life-long scenarios. Our...

10.48550/arxiv.1507.04888 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Many relevant tasks require an agent to reach a certain state, or manipulate objects into desired configuration. For example, we might want robot align and assemble gear onto axle insert turn key in lock. These goal-oriented present considerable challenge for reinforcement learning, since their natural reward function is sparse prohibitive amounts of exploration are required the goal receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations...

10.48550/arxiv.1707.05300 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. used RL train play simplified one-versus-one soccer game. The resulting agent exhibits robust dynamic skills, such as rapid fall recovery, walking, turning, kicking, it transitions between them in smooth efficient manner. It also learned anticipate ball movements block...

10.1126/scirobotics.adi8022 article EN Science Robotics 2024-04-10

We present an approach for learning spatial traversability maps driving in complex, urban environments based on extensive dataset demonstrating the behaviour of human experts. The direct end-to-end mapping from raw input data to cost bypasses effort manually designing parts pipeline, exploits a large number samples, and can be framed additionally refine handcrafted produced manual hand-engineered features. To achieve this, we introduce maximum-entropy-based, non-linear inverse reinforcement...

10.1177/0278364917722396 article EN The International Journal of Robotics Research 2017-08-04

In this work, we present an approach to learn cost maps for driving in complex urban environments from a large number of demonstrations human behaviour. The learned are constructed directly raw sensor measurements, bypassing the effort manually designing as well features. When deploying maps, trajectories generated not only replicate human-like behaviour but also demonstrably robust against systematic errors putative robot configuration. To achieve deploy Maximum Entropy based, non-linear...

10.1109/iros.2016.7759328 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016-10-01

Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. While unsupervised domain adaptation aims to address this challenge, current approaches do not utilise continuity occurring shifts. In particular, many robotics applications exhibit these thus facilitate potential incrementally adapt a learnt model over minor which integrate massive differences time. Our work presents an adversarial approach for...

10.1109/icra.2018.8460982 article EN 2018-05-01

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent in physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed instantaneous muscle tensions or torques, they must be selected serve goals that defined on much longer time scales often involve complex interactions environment other Recent research has...

10.1126/scirobotics.abo0235 article EN Science Robotics 2022-08-31

<title>ABSTRACT</title> <p>This paper describes novel experimental methods aimed at understanding the fundamental phenomena governing motion of lightweight vehicles on dry, granular soils. A single-wheel test rig is used to empirically investigate wheel under controlled slip and loading conditions sandy, dry soil. Test can be designed replicate typical field scenarios for robots, while key operational parameters such as drawbar force, torque, sinkage are measured. This...

10.4271/2024-01-3379 article EN SAE technical papers on CD-ROM/SAE technical paper series 2024-11-15

Appearance changes due to weather and seasonal conditions represent a strong impediment the robust implementation of machine learning systems in outdoor robotics. While supervised optimises model for training domain, it will deliver degraded performance application domains that underlie distributional shifts caused by these changes. Traditionally, this problem has been addressed via collection labelled data multiple or imposing priors on type shift between both domains. We frame context...

10.1109/iros.2017.8205961 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01

Many real-world control problems involve both discrete decision variables - such as the choice of modes, gear switching or digital outputs well continuous velocity setpoints, gains analogue outputs. However, when defining corresponding optimal reinforcement learning problem, it is commonly approximated with fully action spaces. These simplifications aim at tailoring problem to a particular algorithm solver which may only support one type space. Alternatively, expert heuristics are used...

10.48550/arxiv.2001.00449 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We investigate the use of prior knowledge human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating or dog Motion Capture (MoCap) data a skill module. Once learned, this module can be reused complex downstream tasks. Importantly, due imposed by MoCap data, our does not require extensive reward engineering produce sensible natural looking behavior at time reuse. This makes it easy create well-regularized,...

10.48550/arxiv.2203.17138 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing corresponding sub-policies within and between tasks, they provide training data for each policy different high-level compose them to perform novel ones. Existing approaches modular LfD focus either on learning a single task or depend domain knowledge temporal segmentation. In contrast, we propose weakly supervised, domain-agnostic approach based...

10.48550/arxiv.1803.01840 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning games simulations, research robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies simulation, such perform sub-optimally on platform given imperfect calibration model dynamics. We present an approach -- supplemental fine tuning further benefit from parallel...

10.48550/arxiv.1707.07907 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space. In this work, we investigate colloquially known phenomenon that trained agents often prefer actions at boundaries of We draw theoretical connections to emergence bang-bang behavior in optimal control, and provide extensive empirical evaluation across a variety recent RL algorithms. replace normal Gaussian by Bernoulli distribution solely considers extremes along...

10.48550/arxiv.2111.02552 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements.We introduce Regu larized Hierarchical Policy Optimization (RHPO) improve data-efliciency for domains with multiple dominant tasks and ultimately reduce required platform time.To this end, we employ compositional inductive biases on levels corresponding mechanisms sharing off-policy transition across low-level controllers as well...

10.15607/rss.2020.xvi.054 article EN 2020-06-30

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, terms relations that extend far beyond body itself, involving coordination with other agents. Recent research artificial intelligence has shown promise learning-based approaches respective problems complex movement,...

10.48550/arxiv.2105.12196 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...