FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
Empirical examination
DOI:
10.48550/arxiv.2404.02127
Publication Date:
2024-04-02
AUTHORS (10)
ABSTRACT
Instruction tuning is an important step in making language models useful for direct user interaction. However, many legal tasks remain out of reach most open LLMs and there do not yet exist any large scale instruction datasets the domain. This critically limits research this application area. In work, we curate LawInstruct, a dataset, covering 17 jurisdictions, 24 languages total 12M examples. We present evidence that domain-specific pretraining improve performance on LegalBench, including improving Flan-T5 XL by 8 points or 16\% over baseline. effect does generalize across all tasks, training regimes, model sizes, other factors. LawInstruct resource accelerating development with stronger information processing decision capabilities
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....