NFDI4DS | UHH-SEMS - Publication Details

Mohammad Taher Pilehvar

ORCID: 0000-0003-3694-4006

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5091017313

Research Areas

Topic Modeling
Natural Language Processing Techniques
Text Readability and Simplification
Advanced Text Analysis Techniques
Sentiment Analysis and Opinion Mining
Semantic Web and Ontologies
Multimodal Machine Learning Applications
Speech and dialogue systems
Text and Document Classification Technologies
Explainable Artificial Intelligence (XAI)
Advanced Graph Neural Networks
Biomedical Text Mining and Ontologies
Misinformation and Its Impacts
Advanced Neural Network Applications
Speech Recognition and Synthesis
Machine Learning and Data Classification
Software Engineering Research
Geographic Information Systems Studies
Machine Learning in Healthcare
Generative Adversarial Networks and Image Synthesis
Service-Oriented Architecture and Web Services
Hate Speech and Cyberbullying Detection
Advanced Data Processing Techniques
Translation Studies and Practices
Software Reliability and Analysis Research

Cardiff University
2018-2024

Khatam University
2021-2023

Worcester Polytechnic Institute
2023

Iran University of Science and Technology
2018-2022

Tilburg University
2022

University of Cambridge
2016-2021

Pasargad Institute for Advanced Innovative Solutions
2020-2021

University of Massachusetts Amherst
2020

University of Washington
2020

Cornell University
2020

From Word To Sense Embeddings: A Survey on Vector Representations of Meaning

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar

 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge integrated into downstream applications. This survey focuses on representation meaning. We start from theoretical background behind word vector space models highlight one their major limitations: meaning conflation deficiency, which arises representing a with all its possible meanings as single vector. Then, we explain how this deficiency can addressed through...

10.1613/jair.1.11259 article EN cc-by Journal of Artificial Intelligence Research 2018-12-06

SensEmbed: Learning Sense Embeddings for Word and Relational Similarity

OPENALEX - Publications

Ignacio Iacobacci Mohammad Taher Pilehvar Roberto Navigli

Ignacio Iacobacci, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1010 article EN cc-by 2015-01-01

Embeddings for Word Sense Disambiguation: An Evaluation Study

OPENALEX - Publications

Ignacio Iacobacci Mohammad Taher Pilehvar Roberto Navigli

Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability capture semantic information from massive amounts textual content.As result, many tasks Natural Language Processing tried take advantage potential these distributional models.In this work, we study how can be used Word Sense Disambiguation, one oldest and Artificial Intelligence.We propose different methods through which leveraged state-of-the-art supervised WSD system architecture,...

10.18653/v1/p16-1085 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016-01-01

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados

Mohammad Taher Pilehvar, Jose Camacho-Collados. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1128 article EN 2019-01-01

Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

10.1016/j.artint.2016.07.005 article EN publisher-specific-oa Artificial Intelligence 2016-08-17

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Nigel Collier Roberto Navigli

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis Semantic Word Similarity which measures the semantic similarity of word pairs within across five languages: English, Farsi, German, Italian Spanish. High quality datasets were manually curated for languages with high inter-annotator agreements (consistently in 0.9 ballpark). These used semi-automatic construction ten cross-lingual datasets. 17 teams participated task, submitting 24 systems subtask 1 14 2. Results...

10.18653/v1/s17-2002 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar

Text preprocessing is often the first step in pipeline of a Natural Language Processing (NLP) system, with potential impact its final performance. Despite importance, text has not received much attention deep learning literature. In this paper we investigate simple decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on performance standard neural classifier. We perform an extensive evaluation benchmarks from categorization sentiment analysis. While our...

10.18653/v1/w18-5406 article EN cc-by 2018-01-01

What’s missing in geographical parsing?

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nut Limsopatham Nigel Collier

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information many real-world applications such as emergency responses, real-time social media event analysis, understanding location instructions auto-response systems and more. However, geoparsing is still widely regarded challenge because domain language diversity, name ambiguity, metonymic...

10.1007/s10579-017-9385-8 article EN cc-by Language Resources and Evaluation 2017-03-07

NASARI: a Novel Approach to a Semantically-Aware Representation of Items

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1059 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

From senses to texts: An all-in-one graph-based approach for measuring semantic similarity

OPENALEX - Publications

Mohammad Taher Pilehvar Roberto Navigli

10.1016/j.artint.2015.07.005 article EN publisher-specific-oa Artificial Intelligence 2015-07-16

WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados

By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., property words correspond potentially different meanings. To address this limitation, dozens specialized meaning representation techniques such as sense or contextualized have been proposed. However, despite popularity research on topic, very few evaluation benchmarks exist that specifically focus semantics words. In paper we show existing models surpassed performance ceiling standard dataset for...

10.48550/arxiv.1808.09121 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter

OPENALEX - Publications

Costanza Conforti Jakob Berndt Mohammad Taher Pilehvar Chryssi Giannitsarou Flavio Toxvaerd and 1 more

Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier. Proceedings of the 58th Annual Meeting Association for Computational Linguistics. 2020.

10.18653/v1/2020.acl-main.157 article EN cc-by 2020-01-01

SemEval-2023 Task 1: Visual Word Sense Disambiguation

OPENALEX - Publications

Alessandro Raganato Iacer Calixto Asahi Ushio José Camacho-Collados Mohammad Taher Pilehvar

This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task.The objective of Visual-WSD is to identify among a set ten images one that corresponds intended meaning given ambiguous word which accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received total 96 submissions. Out these, 40 systems outperformed strong zero-shot CLIP-based baseline.Participating proposed zero- few-shot approaches, often...

10.18653/v1/2023.semeval-1.308 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

SemEval-2014 Task 3: Cross-Level Semantic Similarity

OPENALEX - Publications

David Jurgens Mohammad Taher Pilehvar Roberto Navigli

This paper introduces a new SemEval task on Cross-Level Semantic Similarity (CLSS), which measures the degree to meaning of larger linguistic item, such as paragraph, is captured by smaller sentence.Highquality data sets were constructed for four comparison types using multi-stage annotation procedures with graded scale similarity.Nineteen teams submitted 38 systems.Most systems surpassed baseline performance, several attaining high performance multiple types.Further, our results show that...

10.3115/v1/s14-2003 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2014-01-01

De-Conflated Semantic Representations

OPENALEX - Publications

Mohammad Taher Pilehvar Nigel Collier

One major deficiency of most semantic representation techniques is that they usually model a word type as single point in the space, hence conflating all meanings can have.Addressing this issue by learning distinct representations for individual words has been subject several research studies past few years.However, generated sense are either not linked to any inventory or unreliable infrequent senses.We propose technique tackles these problems de-conflating based on deep knowledge be...

10.18653/v1/d16-1174 preprint EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

De-Conflated Semantic Representations

OPENALEX - Publications

Mohammad Taher Pilehvar Nigel Henry Collier

One major deficiency of most semantic representation techniques is that they usually model a word type as single point in the space, hence conflating all meanings can have. Addressing this issue by learning distinct representations for individual words has been subject several research studies past few years. However, generated sense are either not linked to any inventory or unreliable infrequent senses. We propose technique tackles these problems de-conflating based on deep knowledge it...

10.17863/cam.7464 article EN Empirical Methods in Natural Language Processing 2016-12-30

Analysis and Evaluation of Language Models for Word Sense Disambiguation

OPENALEX - Publications

Daniel Loureiro Kiamehr Rezaee Mohammad Taher Pilehvar José Camacho-Collados

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability capturing context-sensitive semantic nuances. However, there is still little knowledge about capabilities potential limitations encoding recovering word senses. In this article, we provide an in-depth quantitative qualitative analysis celebrated model with...

10.1162/coli_a_00405 article EN cc-by-nc-nd Computational Linguistics 2021-03-29

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

OPENALEX - Publications

Mohammad Taher Pilehvar Roberto Navigli

The evaluation of several tasks in lexical semantics is often limited by the lack large amounts manual annotations, not only for training purposes, but also testing purposes. Word Sense Disambiguation (WSD) a case point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend be performed on small scale, which does allow in-depth analysis factors that determine systems' performance. In this paper we address issue means realistic simulation...

10.1162/coli_a_00202 article EN cc-by-nc-nd Computational Linguistics 2014-06-18

SemEval-2016 Task 14: Semantic Taxonomy Enrichment

OPENALEX - Publications

David Jurgens Mohammad Taher Pilehvar

Manually constructed taxonomies provide a crucial resource for many NLP technologies, yet these resources are often limited in their lexical coverage due to construction procedure.While multiple approaches have been proposed enrich such with new concepts, techniques typically evaluated by measuring the accuracy at identifying relationships between words, e.g., that dog is canine, rather specific concepts.Task 14 provides an evaluation framework automatic taxonomy enrichment placement of...

10.18653/v1/s16-1169 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2001 article EN cc-by 2015-01-01

Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs

OPENALEX - Publications

Dimitri Kartsaklis Mohammad Taher Pilehvar Nigel Collier

This paper addresses the problem of mapping natural language text to knowledge base entities. The process is approached as a composition phrase or sentence into point in multi-dimensional entity space obtained from graph. compositional model an LSTM equipped with dynamic disambiguation mechanism on input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, prepared by collecting random walks graph enhanced textual features, which act set semantic bridges between and...

10.18653/v1/d18-1221 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

A pragmatic guide to geoparsing evaluation

OPENALEX - Publications

Milan Gritta Mohammad Taher Pilehvar Nigel Collier

Abstract Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by lack distinction between different types toponyms , which necessitates new guidelines, consolidation detailed toponym taxonomy with implications for Named Entity Recognition (NER) beyond. To address these deficiencies, our manuscript...

10.1007/s10579-019-09475-3 article EN cc-by Language Resources and Evaluation 2019-09-19

Towards a Seamless Integration of Word Senses into Downstream NLP Applications

OPENALEX - Publications

Mohammad Taher Pilehvar José Camacho-Collados Roberto Navigli Nigel Collier

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration sense-level information into has remained understudied. By incorporating a novel disambiguation algorithm state-of-the-art classification model, we create pipeline to integrate downstream applications. We show that simple input text lead consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when fine...

10.18653/v1/p17-1170 preprint EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

A Unified Multilingual Semantic Representation of Concepts

OPENALEX - Publications

José Camacho-Collados Mohammad Taher Pilehvar Roberto Navigli

10.3115/v1/p15-1072 article EN cc-by 2015-01-01

Coming Soon ...