LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Benchmark (surveying)
DOI: 10.48550/arxiv.2308.11462 Publication Date: 2023-01-01
ABSTRACT
The advent of large language models (LLMs) and their adoption by the legal community has given rise to question: what types reasoning can LLMs perform? To enable greater study this question, we present LegalBench: a collaboratively constructed benchmark consisting 162 tasks covering six different reasoning. LegalBench was built through an interdisciplinary process, in which collected designed hand-crafted professionals. Because these subject matter experts took leading role construction, either measure capabilities that are practically useful, or skills lawyers find interesting. cross-disciplinary conversations about law, additionally show how popular frameworks for describing -- distinguish between its many forms correspond tasks, thus giving LLM developers common vocabulary. This paper describes LegalBench, presents empirical evaluation 20 open-source commercial LLMs, illustrates research explorations enables.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....