NFDI4DS | UHH-SEMS - Publication Details

Is RLHF More Difficult than Standard RL?

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning Machine Learning (stat.ML) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2306.14111 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Yuanhao Wang

Qinghua Liu

Chi Jin

ABSTRACT

Reinforcement learning from Human Feedback (RLHF) learns preference signals, while standard Learning (RL) directly reward signals. Preferences arguably contain less information than rewards, which makes preference-based RL seemingly more difficult. This paper theoretically proves that, for a wide range of models, we can solve using existing algorithms and techniques reward-based RL, with small or no extra costs. Specifically, (1) preferences that are drawn probabilistic reduce the problem to robust tolerate errors in rewards; (2) general arbitrary where objective is find von Neumann winner, multiagent finds Nash equilibria factored Markov games restricted set policies. The latter case be further reduced adversarial MDP when only depend on final state. We instantiate all subroutines by concrete provable algorithms, apply our theory large class models including tabular MDPs generic function approximation. provide guarantees K-wise comparisons available.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Is RLHF More Difficult than Standard RL?

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....