NFDI4DS | UHH-SEMS - Publication Details

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Benchmark (surveying) Benchmarking Preference learning Robustness Code (set theory)

DOI: 10.48550/arxiv.2111.03026 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Kimin Lee

Laura Smith

Anca D. Dragan

Pieter Abbeel

ABSTRACT

Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard specify for complex tasks. Preference-based RL provides an alternative: policies using teacher's preferences without pre-defined rewards, thus overcoming concerns associated with engineering. However, it is difficult quantify progress in preference-based due lack of commonly adopted benchmark. In this paper, we introduce B-Pref: benchmark specially designed RL. A key challenge such providing ability evaluate candidate algorithms quickly, which makes relying on real human input evaluation prohibitive. At same time, simulating as giving perfect ground truth unrealistic. B-Pref alleviates by teachers wide array irrationalities, and proposes metrics not solely performance also robustness potential irrationalities. We showcase utility analyze algorithmic design choices, selecting informative queries, state-of-the-art algorithms. hope can serve common starting point study more systematically. Source code available at https://github.com/rll-research/B-Pref.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

B-Pref: Benchmarking Preference-Based Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....