Benchmarking LLMs for Political Science: A United Nations Perspective
Benchmarking
DOI:
10.48550/arxiv.2502.14122
Publication Date:
2025-02-19
AUTHORS (9)
ABSTRACT
Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on application of LLMs to United Nations (UN) process, where stakes are particularly high and decisions can far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 2024, including draft resolutions, voting records, diplomatic speeches. Using this dataset, we propose Benchmark (UNBench), first comprehensive benchmark designed evaluate across four interconnected science tasks: co-penholder judgment, representative simulation, adoption prediction, statement generation. These tasks span three stages process--drafting, voting, discussing--and aim assess LLMs' ability understand simulate dynamics. Our experimental analysis demonstrates challenges applying domain, providing insights into strengths limitations science. work contributes growing intersection AI science, opening new avenues research practical applications global governance. The UNBench Repository be accessed at: https://github.com/yueqingliang1/UNBench.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....