Aligning Black-box Language Models with Human Judgments

Black box
DOI: 10.48550/arxiv.2502.04997 Publication Date: 2025-02-07
ABSTRACT
Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks, where relying on human evaluators can be costly, time-consuming, unscalable. LLMs offer an efficient solution for continuous, evaluation. However, since the systems that built improved with these judgments ultimately designed use, it is crucial LLM align closely ensure such remain human-centered. On hand, aligning challenging due individual variability biases in judgments. We propose a simple yet effective framework or their aggregated judgments, without retraining fine-tuning LLM. Our approach learns linear mapping between LLM's outputs achieving over 142% average improvement agreement across 29 tasks only small number of calibration examples training. Notably, our method works zero-shot few-shot settings, exceeds inter-human four out six enables smaller achieve performance comparable larger models.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....