NFDI4DS | UHH-SEMS - Publication Details

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chatbot

DOI: 10.48550/arxiv.2403.04132 Publication Date: 2024-03-06

Abstract Supplemental Material References Cited by

AUTHORS (11)

Wei-Lin Chiang

Lianmin Zheng

Ying Sheng

Anastasios N. Ang...

Tianle Li

Dacheng Li

Hao Zhang

Banghua Zhu

Michael Jordan

Joseph E. Gonzalez

Ion Stoica

ABSTRACT

Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for LLMs based on preferences. Our methodology employs a pairwise comparison approach leverages input from diverse user base through crowdsourcing. The has been operational several months, amassing over 240K votes. This paper describes platform, analyzes data collected so far, explains tried-and-true statistical methods are using efficient accurate evaluation ranking of models. We confirm that crowdsourced questions sufficiently discriminating votes in good agreement those expert raters. These analyses collectively establish robust foundation credibility Arena. Because its unique value openness, Arena emerged as one most referenced LLM leaderboards, widely cited by leading developers companies. demo is publicly available at \url{https://chat.lmsys.org}.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....