Julen Etxaniz

ORCID: 0009-0000-2099-7766
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Translation Studies and Practices
  • Text Readability and Simplification
  • Explainable Artificial Intelligence (XAI)
  • Engineering and Information Technology
  • Advanced Image and Video Retrieval Techniques
  • Technology in Education and Healthcare
  • Robotics and Sensor-Based Localization
  • Interpreting and Communication in Healthcare
  • Deception detection and forensic psychology
  • Media and Digital Communication
  • Multimodal Machine Learning Applications
  • Digital Humanities and Scholarship
  • Basque language and culture studies
  • Computational and Text Analysis Methods
  • Robotic Path Planning Algorithms
  • Epistemology, Ethics, and Metaphysics
  • Speech Recognition and Synthesis
  • Spanish Linguistics and Language Studies

University of the Basque Country
2023

Ikerlan
1994

In this position paper we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Model (LLM) trained test split benchmark, and then evaluated same benchmark. extent problem unknown, as it not straightforward to measure. Contamination causes an overestimation performance contaminated model target benchmark associated task with respect their non-contaminated counterparts....

10.18653/v1/2023.findings-emnlp.722 article EN cc-by 2023-01-01

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations large language models (LLMs) have primarily been conducted English. However, ability LLMs maintain across languages remains under-explored. Our study evaluates 12 state-of-the-art open LLMs, comparing base instruction-tuned using human evaluation, multiple-choice metrics, LLM-as-a-Judge scoring. findings reveal...

10.48550/arxiv.2502.09387 preprint EN arXiv (Cornell University) 2025-02-13

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining new corpus comprising 4.3M documents and 4.2B tokens. Addressing the scarcity high-quality benchmarks Basque, further 4 multiple choice evaluation datasets: EusProficiency, 5,169 questions official proficiency exams; EusReading, 352 reading comprehension questions; EusTrivia, 1,715 trivia 5 knowledge areas; EusExams, 16,774 public...

10.48550/arxiv.2403.20266 preprint EN arXiv (Cornell University) 2024-03-29

XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Understanding (NLU) capabilities across languages. In this paper, we expand include Basque, low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English corpus into followed manual post-edition step. We have conducted series of experiments using mono- and multilingual LLMs assess a)...

10.48550/arxiv.2404.06996 preprint EN arXiv (Cornell University) 2024-04-10

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises question of how well these models perform on topics relevant other cultures, whose presence web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset parallel in English and Basque. The consists local subset with questions pertinent Basque culture, broader interest. We find...

10.48550/arxiv.2406.07302 preprint EN arXiv (Cornell University) 2024-06-11

Artikulu honetan Latxa hizkuntza-ereduak (HE) aurkeztuko ditugu, egun euskararako garatu diren HE handienak. HEek 7.000 miloi parametrotik 70.000 milioira bitartean dituzte, eta ingeleseko LLama 2 ereduetatik eratorriak dira. Horretarako, gainean aurreikasketa jarraitua izeneko prozesua gauzatu da, 4.3 milioi dokumentu 4.200 token duen euskarazko corpusa erabiliz. Euskararentzat kalitate handiko ebaluazio multzoen urritasunari aurre egiteko, lau multzo berri bildu ditugu: EusProficiency, EGA...

10.1387/ekaia.26338 article EKAIA Euskal Herriko Unibertsitateko Zientzi eta Teknologi Aldizkaria 2024-09-24

In this paper we present our submission for the NorSID Shared Task as part of 2025 VarDial Workshop (Scherrer et al., 2025), consisting three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects Norwegian language. For Detection Filling, have fine-tuned a multitask model cross-lingual setting, to leverage xSID dataset available 17 languages. case final consists on provided development set, which has obtained highest scores within...

10.48550/arxiv.2412.10095 preprint EN arXiv (Cornell University) 2024-12-13

Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating input into English using an external machine translation system, and running inference over translated input. However, these improvements can be attributed use separate which typically trained on large amounts parallel data not seen model. In this work, we introduce new called self-translate, overcomes need system leveraging few-shot capabilities Experiments 5...

10.48550/arxiv.2308.01223 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Model (LLM) trained test split benchmark, and then evaluated same benchmark. extent problem unknown, as it not straightforward to measure. Contamination causes an overestimation performance contaminated model target benchmark associated task with respect their non-contaminated...

10.48550/arxiv.2310.18018 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01
Coming Soon ...