NFDI4DS | UHH-SEMS - Publication Details

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Sequence (biology)

DOI: 10.48550/arxiv.2305.10425 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Yao Zhao

Rishabh Joshi

Tianqi Liu

Misha Khalman

Mohammad Saleh

Peter J. Liu

ABSTRACT

Learning from human feedback has been shown to be effective at aligning language models with preferences. Past work often relied on Reinforcement Human Feedback (RLHF), which optimizes the model using reward scores assigned a trained preference data. In this we show how recently introduced Sequence Likelihood Calibration (SLiC), can also used effectively learn preferences (SLiC-HF). Furthermore, demonstrate done data collected for different model, similar off-policy, offline RL Automatic and evaluation experiments TL;DR summarization task that SLiC-HF significantly improves supervised fine-tuning baselines. presents competitive alternative PPO RLHF implementation in past while being much simpler implement, easier tune more computationally efficient practice.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....