NFDI4DS | UHH-SEMS - Publication Details

José Camacho-Collados

ORCID: 0000-0003-1618-7239

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5086289154

Research Areas

Topic Modeling
Natural Language Processing Techniques
Sentiment Analysis and Opinion Mining
Text Readability and Simplification
Advanced Text Analysis Techniques
Speech and dialogue systems
Semantic Web and Ontologies
Text and Document Classification Technologies
Hate Speech and Cyberbullying Detection
Multimodal Machine Learning Applications
Misinformation and Its Impacts
Mental Health via Writing
Social Media and Politics
Biomedical Text Mining and Ontologies
Digital Mental Health Interventions
Language, Metaphor, and Cognition
Speech Recognition and Synthesis
Digital Communication and Language
Lexicography and Language Studies
linguistics and terminology studies
Opinion Dynamics and Social Influence
Advanced Graph Neural Networks
Computational and Text Analysis Methods
Swearing, Euphemism, Multilingualism
Internet Traffic Analysis and Secure E-voting

Cardiff University
2016-2024

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

American Jewish Committee
2023

Amazon (United States)
2023

University of Liverpool
2023

Universitat Pompeu Fabra
2018-2021

Bar-Ilan University
2021

University of Helsinki
2021

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

OPENALEX - Publications

Francesco Barbieri José Camacho-Collados Luis Espinosa-Anke Leonardo Neves

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it unclear what the current state of art is, as there no standardized evaluation protocol, neither a strong set baselines trained on such domain-specific data. In this paper, we propose framework (TweetEval) consisting seven heterogeneous...

10.18653/v1/2020.findings-emnlp.148 article EN cc-by 2020-01-01

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

OPENALEX - Publications

Alessandro Raganato José Camacho-Collados Roberto Navigli

Word Sense Disambiguation is a long-standing task in Natural Language Processing, lying at the core of human language understanding. However, evaluation automatic systems has been problematic, mainly due to lack reliable framework. In this paper we develop unified framework and analyze performance various fair setup. The results show that supervised clearly outperform knowledge-based models. Among systems, linear classifier trained on conventional local features still proves be hard baseline...

10.18653/v1/e17-1010 article EN cc-by 2017-01-01

From Word To Sense Embeddings: A Survey on Vector Representations of Meaning

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar

 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge integrated into downstream applications. This survey focuses on representation meaning. We start from theoretical background behind word vector space models highlight one their major limitations: meaning conflation deficiency, which arises representing a with all its possible meanings as single vector. Then, we explain how this deficiency can addressed through...

10.1613/jair.1.11259 article EN cc-by Journal of Artificial Intelligence Research 2018-12-06

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados

Mohammad Taher Pilehvar, Jose Camacho-Collados. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1128 article EN 2019-01-01

Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

10.1016/j.artint.2016.07.005 article EN publisher-specific-oa Artificial Intelligence 2016-08-17

Inducing Relational Knowledge from BERT

OPENALEX - Publications

Zied Bouraoui José Camacho-Collados Steven Schockaert

One of the most remarkable properties word embeddings is fact that they capture certain types semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range Natural Language Processing tasks. However, it unclear to what extent relational knowledge beyond already captured by standard embeddings. To explore this question, we propose methodology for distilling from model. Starting few seed instances given...

10.1609/aaai.v34i05.6242 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Nigel Collier Roberto Navigli

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis Semantic Word Similarity which measures the semantic similarity of word pairs within across five languages: English, Farsi, German, Italian Spanish. High quality datasets were manually curated for languages with high inter-annotator agreements (consistently in 0.9 ballpark). These used semi-automatic construction ten cross-lingual datasets. 17 teams participated task, submitting 24 systems subtask 1 14 2. Results...

10.18653/v1/s17-2002 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

SemEval 2018 Task 2: Multilingual Emoji Prediction

OPENALEX - Publications

Francesco Barbieri José Camacho-Collados Francesco Ronzano Luis Espinosa-Anke Miguel Ballesteros and 3 more

Francesco Barbieri, Jose Camacho-Collados, Ronzano, Luis Espinosa-Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.

10.18653/v1/s18-1003 article EN cc-by 2018-01-01

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar

Text preprocessing is often the first step in pipeline of a Natural Language Processing (NLP) system, with potential impact its final performance. Despite importance, text has not received much attention deep learning literature. In this paper we investigate simple decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on performance standard neural classifier. We perform an extensive evaluation benchmarks from categorization sentiment analysis. While our...

10.18653/v1/w18-5406 article EN cc-by 2018-01-01

Knowledge-enhanced document embeddings for text classification

OPENALEX - Publications

Roberta Akemi Sinoara José Camacho-Collados Rafael Geraldeli Rossi Roberto Navigli Solange Oliveira Rezende

10.1016/j.knosys.2018.10.026 article EN Knowledge-Based Systems 2018-10-20

NASARI: a Novel Approach to a Semantically-Aware Representation of Items

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1059 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond

OPENALEX - Publications

Francesco Barbieri Luis Espinosa-Anke José Camacho-Collados

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, relied clean pre-training task-specific corpora as signals. In this paper, we introduce XLM-T, a model to train evaluate language Twitter. paper provide: (1) new strong baseline consisting of an XLM-R (Conneau et al. 2020) pre-trained millions tweets over thirty...

10.48550/arxiv.2104.12250 preprint EN other-oa arXiv (Cornell University) 2021-01-01

WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados

By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., property words correspond potentially different meanings. To address this limitation, dozens specialized meaning representation techniques such as sense or contextualized have been proposed. However, despite popularity research on topic, very few evaluation benchmarks exist that specifically focus semantics words. In paper we show existing models surpassed performance ceiling standard dataset for...

10.48550/arxiv.1808.09121 preprint EN other-oa arXiv (Cornell University) 2018-01-01

SemEval-2018 Task 9: Hypernym Discovery

OPENALEX - Publications

José Camacho-Collados Claudio Delli Bovi Luis Espinosa-Anke Sergio Oramas Tommaso Pasini and 4 more

Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.

10.18653/v1/s18-1115 article EN cc-by 2018-01-01

Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication

OPENALEX - Publications

Dimosthenis Antypas Alun Preece José Camacho-Collados

Social media has become extremely influential when it comes to policy making in modern societies, especially the western world, where platforms such as Twitter allow users follow politicians, thus citizens more involved political discussion. In same vein, politicians use express their opinions, debate among others on current topics and promote agendas aiming influence voter behaviour. this paper, we attempt analyse tweets of from three European countries explore virality tweets. Previous...

10.1016/j.osnem.2023.100242 article EN cc-by Online Social Networks and Media 2023-01-01

SemEval-2023 Task 1: Visual Word Sense Disambiguation

OPENALEX - Publications

Alessandro Raganato Iacer Calixto Asahi Ushio José Camacho-Collados Mohammad Taher Pilehvar

This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task.The objective of Visual-WSD is to identify among a set ten images one that corresponds intended meaning given ambiguous word which accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received total 96 submissions. Out these, 40 systems outperformed strong zero-shot CLIP-based baseline.Participating proposed zero- few-shot approaches, often...

10.18653/v1/2023.semeval-1.308 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

OPENALEX - Publications

Massimiliano Mancini José Camacho-Collados Ignacio Iacobacci Roberto Navigli

Word embeddings are widely used in Natural Language Processing, mainly due to their success capturing semantic information from massive corpora. However, creation process does not allow the different meanings of a word be automatically separated, as it conflates them into single vector. We address this issue by proposing new model which learns and sense jointly. Our exploits large corpora knowledge networks order produce unified vector space embeddings. evaluate main features our approach...

10.18653/v1/k17-1012 article EN cc-by 2017-01-01

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

OPENALEX - Publications

Yerai Doval José Camacho-Collados Luis Espinosa-Anke Steven Schockaert

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards middle point between them. By applying our aim is obtain...

10.18653/v1/d18-1027 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition

OPENALEX - Publications

Asahi Ushio José Camacho-Collados

Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. addition its practical utility, facilitates the study and investigation of cross-domain cross-lingual generalization ability LMs finetuned on NER. Our also provides web app where users can get predictions interactively arbitrary text,...

10.18653/v1/2021.eacl-demos.7 preprint EN cc-by 2021-01-01

Analysis and Evaluation of Language Models for Word Sense Disambiguation

OPENALEX - Publications

Daniel Loureiro Kiamehr Rezaee Mohammad Taher Pilehvar José Camacho-Collados

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability capturing context-sensitive semantic nuances. However, there is still little knowledge about capabilities potential limitations encoding recovering word senses. In this article, we provide an in-depth quantitative qualitative analysis celebrated model with...

10.1162/coli_a_00405 article EN cc-by-nc-nd Computational Linguistics 2021-03-29

Clinical Free Text Summaries and LLMs: A Case Study on LLM-supported Identification of Intellectual Disabilities in Clinical Free Text Summaries (Preprint)

OPENALEX - Publications

Aleksandra Edwards Antonio F. Pardiñas George Kirov Elliott Rees José Camacho-Collados

<sec> <title>BACKGROUND</title> Free-text clinical data that is unstructured and narrative in nature can provide a rich source of patient information. However, the information contained within routinely collected health typically captured as free-text, extracting research quality phenotypes from these remains challenge. Manually reviewing free-text notes time-consuming process not suitable for large scale datasets. On other hand, automatically be challenging task due to medical researchers...

10.2196/preprints.72256 preprint EN cc-by 2025-02-06

A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2001 article EN cc-by 2015-01-01

Towards a Seamless Integration of Word Senses into Downstream NLP Applications

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados Roberto Navigli Nigel Collier

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration sense-level information into has remained understudied. By incorporating a novel disambiguation algorithm state-of-the-art classification model, we create pipeline to integrate downstream applications. We show that simple input text lead consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when fine...

10.18653/v1/p17-1170 preprint EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

A Unified Multilingual Semantic Representation of Concepts

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

10.3115/v1/p15-1072 article EN cc-by 2015-01-01

Coming Soon ...