José Camacho-Collados

ORCID: 0000-0003-1618-7239
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Sentiment Analysis and Opinion Mining
  • Text Readability and Simplification
  • Advanced Text Analysis Techniques
  • Speech and dialogue systems
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Hate Speech and Cyberbullying Detection
  • Multimodal Machine Learning Applications
  • Misinformation and Its Impacts
  • Mental Health via Writing
  • Social Media and Politics
  • Biomedical Text Mining and Ontologies
  • Digital Mental Health Interventions
  • Language, Metaphor, and Cognition
  • Speech Recognition and Synthesis
  • Digital Communication and Language
  • Lexicography and Language Studies
  • linguistics and terminology studies
  • Opinion Dynamics and Social Influence
  • Advanced Graph Neural Networks
  • Computational and Text Analysis Methods
  • Swearing, Euphemism, Multilingualism
  • Internet Traffic Analysis and Secure E-voting

Cardiff University
2016-2024

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

American Jewish Committee
2023

Amazon (United States)
2023

University of Liverpool
2023

Universitat Pompeu Fabra
2018-2021

Bar-Ilan University
2021

University of Helsinki
2021

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it unclear what the current state of art is, as there no standardized evaluation protocol, neither a strong set baselines trained on such domain-specific data. In this paper, we propose framework (TweetEval) consisting seven heterogeneous...

10.18653/v1/2020.findings-emnlp.148 article EN cc-by 2020-01-01

Word Sense Disambiguation is a long-standing task in Natural Language Processing, lying at the core of human language understanding. However, evaluation automatic systems has been problematic, mainly due to lack reliable framework. In this paper we develop unified framework and analyze performance various fair setup. The results show that supervised clearly outperform knowledge-based models. Among systems, linear classifier trained on conventional local features still proves be hard baseline...

10.18653/v1/e17-1010 article EN cc-by 2017-01-01


 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge integrated into downstream applications. This survey focuses on representation meaning. We start from theoretical background behind word vector space models highlight one their major limitations: meaning conflation deficiency, which arises representing a with all its possible meanings as single vector. Then, we explain how this deficiency can addressed through...

10.1613/jair.1.11259 article EN cc-by Journal of Artificial Intelligence Research 2018-12-06

Mohammad Taher Pilehvar, Jose Camacho-Collados. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1128 article EN 2019-01-01

One of the most remarkable properties word embeddings is fact that they capture certain types semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range Natural Language Processing tasks. However, it unclear to what extent relational knowledge beyond already captured by standard embeddings. To explore this question, we propose methodology for distilling from model. Starting few seed instances given...

10.1609/aaai.v34i05.6242 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis Semantic Word Similarity which measures the semantic similarity of word pairs within across five languages: English, Farsi, German, Italian Spanish. High quality datasets were manually curated for languages with high inter-annotator agreements (consistently in 0.9 ballpark). These used semi-automatic construction ten cross-lingual datasets. 17 teams participated task, submitting 24 systems subtask 1 14 2. Results...

10.18653/v1/s17-2002 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Francesco Barbieri, Jose Camacho-Collados, Ronzano, Luis Espinosa-Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.

10.18653/v1/s18-1003 article EN cc-by 2018-01-01

Text preprocessing is often the first step in pipeline of a Natural Language Processing (NLP) system, with potential impact its final performance. Despite importance, text has not received much attention deep learning literature. In this paper we investigate simple decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on performance standard neural classifier. We perform an extensive evaluation benchmarks from categorization sentiment analysis. While our...

10.18653/v1/w18-5406 article EN cc-by 2018-01-01

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1059 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, relied clean pre-training task-specific corpora as signals. In this paper, we introduce XLM-T, a model to train evaluate language Twitter. paper provide: (1) new strong baseline consisting of an XLM-R (Conneau et al. 2020) pre-trained millions tweets over thirty...

10.48550/arxiv.2104.12250 preprint EN other-oa arXiv (Cornell University) 2021-01-01

By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., property words correspond potentially different meanings. To address this limitation, dozens specialized meaning representation techniques such as sense or contextualized have been proposed. However, despite popularity research on topic, very few evaluation benchmarks exist that specifically focus semantics words. In paper we show existing models surpassed performance ceiling standard dataset for...

10.48550/arxiv.1808.09121 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.

10.18653/v1/s18-1115 article EN cc-by 2018-01-01

Social media has become extremely influential when it comes to policy making in modern societies, especially the western world, where platforms such as Twitter allow users follow politicians, thus citizens more involved political discussion. In same vein, politicians use express their opinions, debate among others on current topics and promote agendas aiming influence voter behaviour. this paper, we attempt analyse tweets of from three European countries explore virality tweets. Previous...

10.1016/j.osnem.2023.100242 article EN cc-by Online Social Networks and Media 2023-01-01

This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task.The objective of Visual-WSD is to identify among a set ten images one that corresponds intended meaning given ambiguous word which accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received total 96 submissions. Out these, 40 systems outperformed strong zero-shot CLIP-based baseline.Participating proposed zero- few-shot approaches, often...

10.18653/v1/2023.semeval-1.308 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

Word embeddings are widely used in Natural Language Processing, mainly due to their success capturing semantic information from massive corpora. However, creation process does not allow the different meanings of a word be automatically separated, as it conflates them into single vector. We address this issue by proposing new model which learns and sense jointly. Our exploits large corpora knowledge networks order produce unified vector space embeddings. evaluate main features our approach...

10.18653/v1/k17-1012 article EN cc-by 2017-01-01

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards middle point between them. By applying our aim is obtain...

10.18653/v1/d18-1027 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. addition its practical utility, facilitates the study and investigation of cross-domain cross-lingual generalization ability LMs finetuned on NER. Our also provides web app where users can get predictions interactively arbitrary text,...

10.18653/v1/2021.eacl-demos.7 preprint EN cc-by 2021-01-01

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability capturing context-sensitive semantic nuances. However, there is still little knowledge about capabilities potential limitations encoding recovering word senses. In this article, we provide an in-depth quantitative qualitative analysis celebrated model with...

10.1162/coli_a_00405 article EN cc-by-nc-nd Computational Linguistics 2021-03-29

<sec> <title>BACKGROUND</title> Free-text clinical data that is unstructured and narrative in nature can provide a rich source of patient information. However, the information contained within routinely collected health typically captured as free-text, extracting research quality phenotypes from these remains challenge. Manually reviewing free-text notes time-consuming process not suitable for large scale datasets. On other hand, automatically be challenging task due to medical researchers...

10.2196/preprints.72256 preprint EN cc-by 2025-02-06

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2001 article EN cc-by 2015-01-01

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration sense-level information into has remained understudied. By incorporating a novel disambiguation algorithm state-of-the-art classification model, we create pipeline to integrate downstream applications. We show that simple input text lead consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when fine...

10.18653/v1/p17-1170 preprint EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1072 article EN cc-by 2015-01-01
Coming Soon ...