XNLIeu: a dataset for cross-lingual NLI in Basque

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL)
DOI: 10.48550/arxiv.2404.06996 Publication Date: 2024-04-10
ABSTRACT
XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Understanding (NLU) capabilities across languages. In this paper, we expand include Basque, low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English corpus into followed manual post-edition step. We have conducted series of experiments using mono- and multilingual LLMs assess a) effect professional on MT system; b) best strategy for NLI in Basque; c) whether choice influenced fact dataset built translation. results show necessary translate-train obtains better overall, although gain lower when tested natively scratch. Our code datasets are publicly available under open licenses.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....