PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers
FOS: Computer and information sciences
Computer Science - Robotics
Robotics (cs.RO)
DOI:
10.48550/arxiv.2409.13683
Publication Date:
2024-09-20
AUTHORS (7)
ABSTRACT
Preference-based reinforcement learning (PbRL) shows promise in aligning robot behaviors with human preferences, but its success depends heavily on the accurate modeling of preferences through reward models. Most methods adopt Markovian assumptions for preference (PM), which overlook temporal dependencies within behavior trajectories that impact evaluations. While recent works have utilized sequence to mitigate this by sequential non-Markovian rewards, they ignore multimodal nature trajectories, consist elements from two distinctive modalities: state and action. As a result, often struggle capture complex interplay between these modalities significantly shapes preferences. In paper, we propose approach PM disentangling action modalities. We introduce transformer network, named PrefMMT, hierarchically leverages intra-modal inter-modal state-action interactions patterns. demonstrate PrefMMT consistently outperforms state-of-the-art baselines locomotion tasks D4RL benchmark manipulation Meta-World benchmark.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....