MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
Sample (material)
DOI:
10.48550/arxiv.2406.16258
Publication Date:
2024-06-23
AUTHORS (7)
ABSTRACT
Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution interactive imitation learning from intervention, where a expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize prior policy efficiently facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed sample-efficient alignment intervention. Instead of inferring complete characteristics, infers residual reward function that captures discrepancy between expert's underlying functions. It then employs Residual Q-Learning (RQL) align using function. Extensive evaluations on simulated real-world tasks demonstrate achieves
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....