PersonaGym: Evaluating Persona Agents and LLMs
Persona
DOI:
10.48550/arxiv.2407.18416
Publication Date:
2024-07-25
AUTHORS (9)
ABSTRACT
Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona offer significant enhancements diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses different user requirements thereby broadening the scope of However, evaluating performance is incredibly challenging due complexity assessing adherence in free-form interactions environments relevant each agent. We introduce PersonaGym, first dynamic evaluation framework for PersonaScore, automated human-aligned metric grounded decision theory comprehensive large-scale agents. Our 6 open closed-source LLMs, using a benchmark encompassing 200 personas 10,000 questions, reveals opportunities advancement state-of-the-art models. For example, Claude 3.5 Sonnet only has 2.97% relative improvement PersonaScore than GPT despite being much more advanced model. Importantly, we find increased size do not necessarily imply enhanced highlighting pressing need algorithmic architectural invention towards faithful performant
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....