ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2403.14589 Publication Date: 2024-03-21
ABSTRACT
Language agents have demonstrated autonomous decision-making abilities by reasoning with foundation models. Recently, efforts been made to train language for performance improvement, multi-step and action trajectories as the training data. However, collecting such still requires considerable human effort, either artificial annotations or implementations of diverse prompting frameworks. In this work, we propose A$^3$T, a framework that enables Autonomous Annotation Agent Trajectories in style ReAct. The central role is an ActRe agent, which explains reason arbitrary action. When randomly sampling external action, ReAct-style agent could query obtain its textual rationales. Novel are then synthesized prepending posterior from sampled way, executes multiple failed tasks, selects successful ones supplement trajectory contrastive self-training. Realized policy gradient methods binarized rewards, self-training accumulated facilitates closed loop rounds self-improvement. We conduct experiments using QLoRA fine-tuning open-sourced Mistral-7B-Instruct-v0.2. AlfWorld, trained A$^3$T obtains 1-shot success rate 96%, 100% 4 iterative rounds. WebShop, matches average, refinement lead approaching experts. significantly outperform existing techniques, including GPT-4, advanced frameworks, fully fine-tuned LLMs.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....