NFDI4DS | UHH-SEMS - Publication Details

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2403.14589 Publication Date: 2024-03-21

Abstract Supplemental Material References Cited by

AUTHORS (6)

Zonghan Yang

Peng Li

Ming Yan

Ji Zhang

Fei Huang

Yang Liu

ABSTRACT

Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts been made to train language for performance improvement, multi-step and action trajectories as the training data. However, collecting such still requires considerable human effort, either artificial annotations or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables Autonomous Annotation Agent Trajectories in style ReAct. The central role is an ActRe agent, which explains reason arbitrary action. When randomly sampling external action, ReAct-style agent could query obtain its textual rationales. Novel are then synthesized prepending posterior from sampled way, executes multiple failed tasks, selects successful ones supplement trajectory contrastive self-training. Realized policy gradient methods binarized rewards, self-training accumulated facilitates closed loop rounds self-improvement. We conduct experiments using QLoRA fine-tuning open-sourced Mistral-7B-Instruct-v0.2. AlfWorld, trained A$^3$T obtains 1-shot success rate 96%, 100% 4 iterative rounds. WebShop, matches average, refinement lead approaching experts. significantly outperform existing techniques, including GPT-4, advanced frameworks, fully fine-tuned LLMs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....