B-Pref: Benchmarking Preference-Based Reinforcement Learning
Benchmark (surveying)
Benchmarking
Preference learning
Robustness
Code (set theory)
DOI:
10.48550/arxiv.2111.03026
Publication Date:
2021-01-01
AUTHORS (4)
ABSTRACT
Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard specify for complex tasks. Preference-based RL provides an alternative: policies using teacher's preferences without pre-defined rewards, thus overcoming concerns associated with engineering. However, it is difficult quantify progress in preference-based due lack of commonly adopted benchmark. In this paper, we introduce B-Pref: benchmark specially designed RL. A key challenge such providing ability evaluate candidate algorithms quickly, which makes relying on real human input evaluation prohibitive. At same time, simulating as giving perfect ground truth unrealistic. B-Pref alleviates by teachers wide array irrationalities, and proposes metrics not solely performance also robustness potential irrationalities. We showcase utility analyze algorithmic design choices, selecting informative queries, state-of-the-art algorithms. hope can serve common starting point study more systematically. Source code available at https://github.com/rll-research/B-Pref.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....