NFDI4DS | UHH-SEMS - Publication Details

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Sample (material)

DOI: 10.48550/arxiv.2406.16258 Publication Date: 2024-06-23

Abstract Supplemental Material References Cited by

AUTHORS (7)

Y. Q. Chen

Chen Tang

Chenran Li

Ran Tian

Peter Stone

Masayoshi Tomizuka

Wei Zhan

ABSTRACT

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution interactive imitation learning from intervention, where a expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize prior policy efficiently facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed sample-efficient alignment intervention. Instead of inferring complete characteristics, infers residual reward function that captures discrepancy between expert's underlying functions. It then employs Residual Q-Learning (RQL) align using function. Extensive evaluations on simulated real-world tasks demonstrate achieves

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....