NFDI4DS | UHH-SEMS - Publication Details

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Benchmark (surveying)

DOI: 10.48550/arxiv.2308.11462 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (40)

Neel Guha

Julian Nyarko

Daniel E. Ho

Christopher Ré

Adam Chilton

Aditya Narayana

Alex Chohlas-Wood

Austin Peters

Brandon Waldon

Daniel N. Rockmore

Diego Zambrano

Dmitry Talisman

Enam Hoque

Faiz Surani

Frank Fagan

Galit A. Sarfaty

Gregory M. Dickinson

Haggai Porat

Jason Hegland

Jessica J Wu

Joe Nudell

Joel Niklaus

John J. Nay

Jonathan H. Choi

Kevin Tobia

Margaret Hagan

Megan Ma

Michael A. Livermore

Nikon Rasumov-Rahe

Nils Holzenberger

Noam Kolt

Peter Henderson

Sean Rehaag

Sharad Goel

Shang Gao

Spencer Williams

Sunny Gandhi

Tom Zur

Varun Iyer

Zehua Li

ABSTRACT

The advent of large language models (LLMs) and their adoption by the legal community has given rise to question: what types reasoning can LLMs perform? To enable greater study this question, we present LegalBench: a collaboratively constructed benchmark consisting 162 tasks covering six different reasoning. LegalBench was built through an interdisciplinary process, in which collected designed hand-crafted professionals. Because these subject matter experts took leading role construction, either measure capabilities that are practically useful, or skills lawyers find interesting. cross-disciplinary conversations about law, additionally show how popular frameworks for describing -- distinguish between its many forms correspond tasks, thus giving LLM developers common vocabulary. This paper describes LegalBench, presents empirical evaluation 20 open-source commercial LLMs, illustrates research explorations enables.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....