Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chatbot
DOI: 10.48550/arxiv.2403.04132 Publication Date: 2024-03-06
ABSTRACT
Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for LLMs based on preferences. Our methodology employs a pairwise comparison approach leverages input from diverse user base through crowdsourcing. The has been operational several months, amassing over 240K votes. This paper describes platform, analyzes data collected so far, explains tried-and-true statistical methods are using efficient accurate evaluation ranking of models. We confirm that crowdsourced questions sufficiently discriminating votes in good agreement those expert raters. These analyses collectively establish robust foundation credibility Arena. Because its unique value openness, Arena emerged as one most referenced LLM leaderboards, widely cited by leading developers companies. demo is publicly available at \url{https://chat.lmsys.org}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....