LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Benchmark (surveying) Best practice
DOI: 10.18653/v1/2023.acl-long.865 Publication Date: 2023-08-05T00:57:42Z
ABSTRACT
In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine interplay between their original objective, acquired knowledge, and legal understanding capacities which define as upstream, probing, downstream performance, respectively. consider not only models' size but also pre-training corpora used important dimensions in our study. To end, release multinational English corpus (LeXFiles) knowledge probing benchmark (LegalLAMA) to facilitate training PLMs. two new PLMs trained LeXFiles evaluate them alongside others LegalLAMA LexGLUE. find that strongly correlates with upstream related topics. On other hand, is mainly driven by model's prior can be estimated performance. Based these findings, conclude both are for those seeking development domain-specific
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (7)