Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

Crowd sourcing
DOI: 10.1145/3589335.3651243 Publication Date: 2024-05-12T22:41:21Z
ABSTRACT
Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large models have been proposed this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking capability to evaluate subjective questions which is extremely common models. Additionally, predominantly utilize centralized datasets evaluation, with question banks concentrated within platforms themselves. Moreover, processes employed by often overlook personalized factors, neglecting consider individual characteristics both evaluators and being evaluated. To address limitations, we propose novel anonymous crowd-sourcing platform, BingJian, that employs competitive scoring mechanism where users participate ranking based performance. This platform stands out not only support evaluations assess general capabilities but also offering an open gateway. Through gateway, opportunity submit testing potentially broader range capabilities. Furthermore, our introduces scenarios, leveraging various forms human-computer interaction manner accounts user preferences contexts. The demonstration BingJian can be accessed at https://github.com/Mingyue-Cheng/Bingjian.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (11)
CITATIONS (0)