NFDI4DS | UHH-SEMS - Publication Details

RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2402.13463 Publication Date: 2024-02-20

Abstract Supplemental Material References Cited by

AUTHORS (3)

Jianhao Yan

Yun Luo

Yue Zhang

ABSTRACT

The application scope of large language models (LLMs) is increasingly expanding. In practical use, users might provide feedback based on the model's output, hoping for a responsive model that can complete responses according to their feedback. Whether appropriately respond users' refuting and consistently follow through with execution has not been thoroughly analyzed. light this, this paper proposes comprehensive benchmark, RefuteBench, covering tasks such as question answering, machine translation, email writing. evaluation aims assess whether positively accept in form instructions they adhere user demands throughout conversation. We conduct evaluations numerous LLMs find are stubborn, i.e. exhibit inclination internal knowledge, often failing comply Additionally, length conversation increases, gradually forget user's stated roll back own responses. further propose recall-and-repeat prompts simple effective way enhance responsiveness

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....