- Natural Language Processing Techniques
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Semantic Web and Ontologies
- Biomedical Text Mining and Ontologies
- Advanced Text Analysis Techniques
- Text Readability and Simplification
- Data Quality and Management
- Speech and dialogue systems
- Hate Speech and Cyberbullying Detection
- Internet Traffic Analysis and Secure E-voting
- Spam and Phishing Detection
- Text and Document Classification Technologies
- Web Data Mining and Analysis
- Linguistic Studies and Language Acquisition
- Misinformation and Its Impacts
- Spanish Linguistics and Language Studies
- linguistics and terminology studies
- Cybercrime and Law Enforcement Studies
- Wikis in Education and Collaboration
- Digital Marketing and Social Media
- Personal Information Management and User Behavior
- Software Engineering Research
- Government, Law, and Information Management
- Machine Learning in Healthcare
Vicomtech
2014-2025
Tencent (China)
2021
University of the Basque Country
2006-2020
Universitat Politècnica de Catalunya
2007-2009
Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic. Due to the massive rise user-generated web content social media, amount hate also steadily increasing. Over past years, interest in online detection and, particularly, automation this task has continuously grown, along with societal impact phenomenon. This paper...
The Web is a huge virtual space where to express and share individual opinions, influencing any aspect of life, with implications for marketing communication alike. Social Media are consumersâ preferences by shaping their attitudes behaviors. Monitoring the activities good way measure customersâ loyalty, keeping track on sentiment towards brands or products. next logical arena. Currently, Facebook dominates digital space, followed closely Twitter. This paper describes Sentiment Analysis...
Extended Reality (XR) is evolving rapidly, offering new paradigms for humancomputer interaction. This position paper argues that integrating Large Language Models (LLMs) with XR systems represents a fundamental shift toward more intelligent, context-aware, and adaptive mixed-reality experiences. We propose structured framework built on three key pillars: (1) Perception Situational Awareness, (2) Knowledge Modeling Reasoning, (3) Visualization Interaction. believe leveraging LLMs within...
This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the uses wide-coverage knowledge-based Word Sense Disambiguation algorithm to assign most appropriate senses large sets of topically related words acquired web. KnowNet, resulting knowledge-base which connects semantically-related concepts is major step towards autonomous acquisition raw corpora. In fact, KnowNet several times larger than any...
This paper presents our participation in SemEval-2015 task 12 (Aspect Based Sentiment Analysis).We participated employing only unsupervised or weakly-supervised approaches.Our attempt is based on requiring the minimum annotated hand-crafted content, and avoids training a model using provided set.We use continuous word representations (Word2Vec) to leverage in-domain semantic similarities of words for many involved subtasks.
This paper presents an empirical evaluation of the quality publicly available large-scale knowledge resources. The study includes a wide range manually and automatically derived In order to establish fair neutral comparison, each resource is indirectly evaluated using same method on Word Sense Disambiguation task. framework selected has been Senseval-3 English Lexical Sample Task. empirically demonstrates that acquired resources surpass both in terms precision recall manually, combination...
This paper introduces the first version of NUBes corpus (Negation and Uncertainty annotations in Biomedical texts Spanish). The is part an on-going research currently consists 29,682 sentences obtained from anonymised health records annotated with negation uncertainty. article includes exhaustive comparison similar corpora Spanish, presents main annotation design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate dataset. As far as know,...
This paper presents V3, an unsupervised system for aspect-based Sentiment Analysis when evaluated on the SemEval 2014 Task 4. V3 focuses generating a list of aspect terms new domain using collection raw texts from domain. We also implement very basic approach to classify into categories and assign polarities them.
Massive digital data processing provides a wide range of opportunities and benefits, but at the cost endangering personal privacy. Anonymisation consists in removing or replacing sensitive information from data, enabling its exploitation for different purposes while preserving privacy individuals. Over years, lot automatic anonymisation systems have been proposed; however, depending on type target language availability training documents, task remains challenging still. The emergence novel...
This paper describes a web-based application to design and answer exercises for language learning. It is available in Basque, Spanish, English, French. Based on open-source Natural Language Processing (NLP) technology such as word embedding models sense disambiguation, the enables users automatic create easily real time three types of exercises, namely, Fill-in-the-Gaps, Multiple Choice, Shuffled Sentences questionnaires. These are generated from texts users’ own choice, so they can train...
Abstract Motivation Biomedical literature is one of the most relevant sources information for knowledge mining in field Bioinformatics. In spite English being widely addressed language field; recent years, there has been a growing interest from natural processing community dealing with languages other than English. However, availability resources and tools appropriate treatment non-English texts lacking behind. Our research concerned semantic annotation biomedical Spanish language, which can...
This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks evaluate resource. Furthermore, trying be neutral possible with respect bases studied, apply systematically same disambiguation method all resources. A completely...