NFDI4DS | UHH-SEMS - Publication Details

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Zero (linguistics)

DOI: 10.48550/arxiv.2310.12921 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Juan Rocamonde

Victoriano Montes...

Elvis Nava

Ethan Perez

David Lindner

ABSTRACT

Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or model from large amount of human feedback, very expensive. We study more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot (RMs) to specify tasks via natural language. propose and general approach VLMs models, we call VLM-RMs. use VLM-RMs based on CLIP train MuJoCo humanoid learn complex without specified such kneeling, doing the splits, sitting in lotus position. For each these tasks, only provide single sentence text prompt describing desired task with minimal engineering. videos trained agents at: https://sites.google.com/view/vlm-rm. can improve performance by providing second "baseline" projecting out parts embedding space irrelevant distinguish between goal baseline. Further, find strong scaling effect for VLM-RMs: larger compute data are better models. The failure modes encountered all related known capability limitations current VLMs, limited spatial reasoning ability visually unrealistic environments that far off-distribution VLM. remarkably robust long VLM enough. This suggests future will become useful wide range RL applications.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....