NFDI4DS | UHH-SEMS - Publication Details

PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers

FOS: Computer and information sciences Computer Science - Robotics Robotics (cs.RO)

DOI: 10.48550/arxiv.2409.13683 Publication Date: 2024-09-20

Abstract Supplemental Material References Cited by

AUTHORS (7)

Dezhong Zhao

Ruiqi Wang

Dougyoung Suh

Taehyeon Kim

Ziqin Yuan

Byung‐Cheol Min

Guohua Chen

ABSTRACT

Preference-based reinforcement learning (PbRL) shows promise in aligning robot behaviors with human preferences, but its success depends heavily on the accurate modeling of preferences through reward models. Most methods adopt Markovian assumptions for preference (PM), which overlook temporal dependencies within behavior trajectories that impact evaluations. While recent works have utilized sequence to mitigate this by sequential non-Markovian rewards, they ignore multimodal nature trajectories, consist elements from two distinctive modalities: state and action. As a result, often struggle capture complex interplay between these modalities significantly shapes preferences. In paper, we propose approach PM disentangling action modalities. We introduce transformer network, named PrefMMT, hierarchically leverages intra-modal inter-modal state-action interactions patterns. demonstrate PrefMMT consistently outperforms state-of-the-art baselines locomotion tasks D4RL benchmark manipulation Meta-World benchmark.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....