- Topic Modeling
- Natural Language Processing Techniques
- Sentiment Analysis and Opinion Mining
- Text Readability and Simplification
- Advanced Text Analysis Techniques
- Speech and dialogue systems
- Semantic Web and Ontologies
- Text and Document Classification Technologies
- Hate Speech and Cyberbullying Detection
- Multimodal Machine Learning Applications
- Misinformation and Its Impacts
- Mental Health via Writing
- Social Media and Politics
- Biomedical Text Mining and Ontologies
- Digital Mental Health Interventions
- Language, Metaphor, and Cognition
- Speech Recognition and Synthesis
- Digital Communication and Language
- Lexicography and Language Studies
- linguistics and terminology studies
- Opinion Dynamics and Social Influence
- Advanced Graph Neural Networks
- Computational and Text Analysis Methods
- Swearing, Euphemism, Multilingualism
- Internet Traffic Analysis and Secure E-voting
Cardiff University
2016-2024
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
American Jewish Committee
2023
Amazon (United States)
2023
University of Liverpool
2023
Universitat Pompeu Fabra
2018-2021
Bar-Ilan University
2021
University of Helsinki
2021
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it unclear what the current state of art is, as there no standardized evaluation protocol, neither a strong set baselines trained on such domain-specific data. In this paper, we propose framework (TweetEval) consisting seven heterogeneous...
Word Sense Disambiguation is a long-standing task in Natural Language Processing, lying at the core of human language understanding. However, evaluation automatic systems has been problematic, mainly due to lack reliable framework. In this paper we develop unified framework and analyze performance various fair setup. The results show that supervised clearly outperform knowledge-based models. Among systems, linear classifier trained on conventional local features still proves be hard baseline...

 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge integrated into downstream applications. This survey focuses on representation meaning. We start from theoretical background behind word vector space models highlight one their major limitations: meaning conflation deficiency, which arises representing a with all its possible meanings as single vector. Then, we explain how this deficiency can addressed through...
Mohammad Taher Pilehvar, Jose Camacho-Collados. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
One of the most remarkable properties word embeddings is fact that they capture certain types semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range Natural Language Processing tasks. However, it unclear to what extent relational knowledge beyond already captured by standard embeddings. To explore this question, we propose methodology for distilling from model. Starting few seed instances given...
This paper introduces a new task on Multilingual and Cross-lingual SemanticThis Semantic Word Similarity which measures the semantic similarity of word pairs within across five languages: English, Farsi, German, Italian Spanish. High quality datasets were manually curated for languages with high inter-annotator agreements (consistently in 0.9 ballpark). These used semi-automatic construction ten cross-lingual datasets. 17 teams participated task, submitting 24 systems subtask 1 14 2. Results...
Francesco Barbieri, Jose Camacho-Collados, Ronzano, Luis Espinosa-Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.
Text preprocessing is often the first step in pipeline of a Natural Language Processing (NLP) system, with potential impact its final performance. Despite importance, text has not received much attention deep learning literature. In this paper we investigate simple decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on performance standard neural classifier. We perform an extensive evaluation benchmarks from categorization sentiment analysis. While our...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, relied clean pre-training task-specific corpora as signals. In this paper, we introduce XLM-T, a model to train evaluate language Twitter. paper provide: (1) new strong baseline consisting of an XLM-R (Conneau et al. 2020) pre-trained millions tweets over thirty...
By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., property words correspond potentially different meanings. To address this limitation, dozens specialized meaning representation techniques such as sense or contextualized have been proposed. However, despite popularity research on topic, very few evaluation benchmarks exist that specifically focus semantics words. In paper we show existing models surpassed performance ceiling standard dataset for...
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.
Social media has become extremely influential when it comes to policy making in modern societies, especially the western world, where platforms such as Twitter allow users follow politicians, thus citizens more involved political discussion. In same vein, politicians use express their opinions, debate among others on current topics and promote agendas aiming influence voter behaviour. this paper, we attempt analyse tweets of from three European countries explore virality tweets. Previous...
This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task.The objective of Visual-WSD is to identify among a set ten images one that corresponds intended meaning given ambiguous word which accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received total 96 submissions. Out these, 40 systems outperformed strong zero-shot CLIP-based baseline.Participating proposed zero- few-shot approaches, often...
Word embeddings are widely used in Natural Language Processing, mainly due to their success capturing semantic information from massive corpora. However, creation process does not allow the different meanings of a word be automatically separated, as it conflates them into single vector. We address this issue by proposing new model which learns and sense jointly. Our exploits large corpora knowledge networks order produce unified vector space embeddings. evaluate main features our approach...
Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards middle point between them. By applying our aim is obtain...
Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. addition its practical utility, facilitates the study and investigation of cross-domain cross-lingual generalization ability LMs finetuned on NER. Our also provides web app where users can get predictions interactively arbitrary text,...
Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability capturing context-sensitive semantic nuances. However, there is still little knowledge about capabilities potential limitations encoding recovering word senses. In this article, we provide an in-depth quantitative qualitative analysis celebrated model with...
<sec> <title>BACKGROUND</title> Free-text clinical data that is unstructured and narrative in nature can provide a rich source of patient information. However, the information contained within routinely collected health typically captured as free-text, extracting research quality phenotypes from these remains challenge. Manually reviewing free-text notes time-consuming process not suitable for large scale datasets. On other hand, automatically be challenging task due to medical researchers...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration sense-level information into has remained understudied. By incorporating a novel disambiguation algorithm state-of-the-art classification model, we create pipeline to integrate downstream applications. We show that simple input text lead consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when fine...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.