NFDI4DS | UHH-SEMS - Publication Details

True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

Normalization

DOI: 10.48550/arxiv.2401.14151 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Weihao Tan

Wentao Zhang

Shanqi Liu

Longtao Zheng

Xinrun Wang

Bo An

ABSTRACT

Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to misalignment of knowledge LLMs with environments. On contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align environments but difficult incorporate prior for efficient explorations. To narrow gap, we propose TWOSOME, a novel general online framework that deploys as efficiently interact and embodied via RL without requiring any prepared datasets or Firstly, query joint probabilities each valid action form behavior policies. Then, enhance stability robustness policies, two normalization methods summarize four prompt design principles. Finally, parameter-efficient training architecture where actor critic share one frozen LLM equipped low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency compared conventional method, PPO, tuning SayCan, both classical environment, Overcooked, simulated household VirtualHome. ii) Benefiting LLMs' open-vocabulary feature, shows superior generalization ability unseen tasks. iii) Under our framework, there is no significant loss original during PPO finetuning.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....