NFDI4DS | UHH-SEMS - Publication Details

PersonaGym: Evaluating Persona Agents and LLMs

Persona

DOI: 10.48550/arxiv.2407.18416 Publication Date: 2024-07-25

Abstract Supplemental Material References Cited by

AUTHORS (9)

Vinay Samuel

Henry Peng Zou

Yue Zhou

Shreyas Chaudhari

Ashwin Kalyan

Tanmay Rajpurohit

Ameet Deshpande

Karthik Narasimhan

Vishvak Murahari

ABSTRACT

Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona offer significant enhancements diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses different user requirements thereby broadening the scope of However, evaluating performance is incredibly challenging due complexity assessing adherence in free-form interactions environments relevant each agent. We introduce PersonaGym, first dynamic evaluation framework for PersonaScore, automated human-aligned metric grounded decision theory comprehensive large-scale agents. Our 6 open closed-source LLMs, using a benchmark encompassing 200 personas 10,000 questions, reveals opportunities advancement state-of-the-art models. For example, Claude 3.5 Sonnet only has 2.97% relative improvement PersonaScore than GPT despite being much more advanced model. Importantly, we find increased size do not necessarily imply enhanced highlighting pressing need algorithmic architectural invention towards faithful performant

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

PersonaGym: Evaluating Persona Agents and LLMs

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....