NFDI4DS | UHH-SEMS - Publication Details

Aligning Black-box Language Models with Human Judgments

Black box

DOI: 10.48550/arxiv.2502.04997 Publication Date: 2025-02-07

Abstract Supplemental Material References Cited by

AUTHORS (4)

Gerrit J. J. van ...

Gen Suzuki

Wei Liu

Murat Şensoy

ABSTRACT

Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks, where relying on human evaluators can be costly, time-consuming, unscalable. LLMs offer an efficient solution for continuous, evaluation. However, since the systems that built improved with these judgments ultimately designed use, it is crucial LLM align closely ensure such remain human-centered. On hand, aligning challenging due individual variability biases in judgments. We propose a simple yet effective framework or their aggregated judgments, without retraining fine-tuning LLM. Our approach learns linear mapping between LLM's outputs achieving over 142% average improvement agreement across 29 tasks only small number of calibration examples training. Notably, our method works zero-shot few-shot settings, exceeds inter-human four out six enables smaller achieve performance comparable larger models.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Aligning Black-box Language Models with Human Judgments

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....