NFDI4DS | UHH-SEMS - Publication Details

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2309.13233 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Sam Davidson

Salvatore Romeo

Raphael Shu

James Gung

Arshit Gupta

Saab Mansour

Yi Zhang

ABSTRACT

One of the major impediments to development new task-oriented dialogue (TOD) systems is need for human evaluation at multiple stages and iterations process. In an effort move toward automated TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). order increase linguistic diversity our system relative related previous work, do not fine-tune LLMs used by on existing TOD datasets; rather use in-context learning prompt generate robust linguistically diverse output with goal simulating behavior interlocutors. Unlike which sought maximize success rate (GSR) as primary metric performance, achieves GSR similar that observed in interactions systems. Using this approach, current effectively able interact several systems, especially single-intent conversational goals, while generating lexically syntactically simulators rely upon fine-tuned models. Finally, collect Human2Bot dataset humans interacting same experimented better quantify these achievements.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....