Ander Salaberria

ORCID: 0000-0002-4277-3939
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Speech and dialogue systems
  • Domain Adaptation and Few-Shot Learning
  • Video Analysis and Summarization
  • Human Motion and Animation
  • Advanced Text Analysis Techniques
  • Robotics and Automated Systems
  • Language, Metaphor, and Cognition
  • Data Quality and Management
  • Data Visualization and Analytics

Donostia International Physics Center
2025

University of the Basque Country
2022-2023

Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answering (VQA) is an open problem. Given that pretrained language models have been shown to include world knowledge, we propose use a unimodal (text-only) train and inference procedure based on automatic off-the-shelf captioning of images models. More specifically, verbalize the image contents allow better leverage their implicit solve knowledge-intensive tasks. Focusing task which requires...

10.1016/j.eswa.2022.118669 article EN cc-by Expert Systems with Applications 2022-08-28

Extended Reality (XR) is evolving rapidly, offering new paradigms for humancomputer interaction. This position paper argues that integrating Large Language Models (LLMs) with XR systems represents a fundamental shift toward more intelligent, context-aware, and adaptive mixed-reality experiences. We propose structured framework built on three key pillars: (1) Perception Situational Awareness, (2) Knowledge Modeling Reasoning, (3) Visualization Interaction. believe leveraging LLMs within...

10.1109/mcg.2025.3548554 article EN cc-by IEEE Computer Graphics and Applications 2025-01-01

Named Entity Recognition (NER) is a core natural language processing task in which pre-trained models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities fine-grained way. In this paper we present novel cascade approach comprising three steps: first, identifying candidate input sentence; second, linking each an existing knowledge base;...

10.18653/v1/2023.semeval-1.186 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

Existing work has observed that current text-to-image systems do not accurately reflect explicit spatial relations between objects such as 'left of' or 'below'. We hypothesize this is because rarely appear in the image captions used to train these models. propose an automatic method that, given existing images, generates synthetic contain 14 relations. introduce Spatial Relation for Generation (SR4G) dataset, which contains 9.9 millions image-caption pairs training, and more than 60 thousand...

10.48550/arxiv.2403.00587 preprint EN arXiv (Cornell University) 2024-03-01

Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between correct textual description and a synthetic hard negative text. In this work we present Bidirectional (BiVLC) dataset. The novelty of BiVLC is add image generated from text, resulting in two examples (one for each image) and, more importantly, text-to-image text). Human annotators filter out ill-formed ensuring...

10.48550/arxiv.2406.09952 preprint EN arXiv (Cornell University) 2024-06-14

Named Entity Recognition (NER) is a core natural language processing task in which pre-trained models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities fine-grained way. In this paper we present novel cascade approach comprising three steps: first, identifying candidate input sentence; second, linking each an existing knowledge base;...

10.48550/arxiv.2304.10637 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

The combination of visual and textual representations has produced excellent results in tasks such as image captioning question answering, but the inference capabilities multimodal are largely untested. In case representations, Textual Entailment Semantic Similarity have been often used to benchmark quality representations. long term goal our research is devise representation techniques that improve current capabilities. We thus present a novel task, Visual (vSTS), where ability can be...

10.48550/arxiv.2004.01894 preprint EN other-oa arXiv (Cornell University) 2020-01-01

This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and properly trained leverage those locations. We perform experiments on a verbalized version the Visual Spatial Reasoning (VSR) dataset, where images coupled textual statements which contain real fake between two image. verbalize using an off-the-shelf object detector, adding tokens every label represent their...

10.1016/j.neunet.2023.11.031 article EN cc-by Neural Networks 2023-11-17
Coming Soon ...