- Topic Modeling
- Natural Language Processing Techniques
- Text Readability and Simplification
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Semantic Web and Ontologies
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Text and Document Classification Technologies
- Explainable Artificial Intelligence (XAI)
- Advanced Graph Neural Networks
- Biomedical Text Mining and Ontologies
- Misinformation and Its Impacts
- Advanced Neural Network Applications
- Speech Recognition and Synthesis
- Machine Learning and Data Classification
- Software Engineering Research
- Geographic Information Systems Studies
- Machine Learning in Healthcare
- Generative Adversarial Networks and Image Synthesis
- Service-Oriented Architecture and Web Services
- Hate Speech and Cyberbullying Detection
- Advanced Data Processing Techniques
- Translation Studies and Practices
- Software Reliability and Analysis Research
Cardiff University
2018-2024
Khatam University
2021-2023
Worcester Polytechnic Institute
2023
Iran University of Science and Technology
2018-2022
Tilburg University
2022
University of Cambridge
2016-2021
Pasargad Institute for Advanced Innovative Solutions
2020-2021
University of Massachusetts Amherst
2020
University of Washington
2020
Cornell University
2020

 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge integrated into downstream applications. This survey focuses on representation meaning. We start from theoretical background behind word vector space models highlight one their major limitations: meaning conflation deficiency, which arises representing a with all its possible meanings as single vector. Then, we explain how this deficiency can addressed through...
Ignacio Iacobacci, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability capture semantic information from massive amounts textual content.As result, many tasks Natural Language Processing tried take advantage potential these distributional models.In this work, we study how can be used Word Sense Disambiguation, one oldest and Artificial Intelligence.We propose different methods through which leveraged state-of-the-art supervised WSD system architecture,...
Mohammad Taher Pilehvar, Jose Camacho-Collados. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
This paper introduces a new task on Multilingual and Cross-lingual SemanticThis Semantic Word Similarity which measures the semantic similarity of word pairs within across five languages: English, Farsi, German, Italian Spanish. High quality datasets were manually curated for languages with high inter-annotator agreements (consistently in 0.9 ballpark). These used semi-automatic construction ten cross-lingual datasets. 17 teams participated task, submitting 24 systems subtask 1 14 2. Results...
Text preprocessing is often the first step in pipeline of a Natural Language Processing (NLP) system, with potential impact its final performance. Despite importance, text has not received much attention deep learning literature. In this paper we investigate simple decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on performance standard neural classifier. We perform an extensive evaluation benchmarks from categorization sentiment analysis. While our...
Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information many real-world applications such as emergency responses, real-time social media event analysis, understanding location instructions auto-response systems and more. However, geoparsing is still widely regarded challenge because domain language diversity, name ambiguity, metonymic...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.
By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., property words correspond potentially different meanings. To address this limitation, dozens specialized meaning representation techniques such as sense or contextualized have been proposed. However, despite popularity research on topic, very few evaluation benchmarks exist that specifically focus semantics words. In paper we show existing models surpassed performance ceiling standard dataset for...
Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier. Proceedings of the 58th Annual Meeting Association for Computational Linguistics. 2020.
This paper presents the Visual Word Sense Disambiguation (Visual-WSD) task.The objective of Visual-WSD is to identify among a set ten images one that corresponds intended meaning given ambiguous word which accompanied with minimal context. The task provides datasets for three different languages: English, Italian, and Farsi.We received total 96 submissions. Out these, 40 systems outperformed strong zero-shot CLIP-based baseline.Participating proposed zero- few-shot approaches, often...
This paper introduces a new SemEval task on Cross-Level Semantic Similarity (CLSS), which measures the degree to meaning of larger linguistic item, such as paragraph, is captured by smaller sentence.Highquality data sets were constructed for four comparison types using multi-stage annotation procedures with graded scale similarity.Nineteen teams submitted 38 systems.Most systems surpassed baseline performance, several attaining high performance multiple types.Further, our results show that...
One major deficiency of most semantic representation techniques is that they usually model a word type as single point in the space, hence conflating all meanings can have.Addressing this issue by learning distinct representations for individual words has been subject several research studies past few years.However, generated sense are either not linked to any inventory or unreliable infrequent senses.We propose technique tackles these problems de-conflating based on deep knowledge be...
One major deficiency of most semantic representation techniques is that they usually model a word type as single point in the space, hence conflating all meanings can have. Addressing this issue by learning distinct representations for individual words has been subject several research studies past few years. However, generated sense are either not linked to any inventory or unreliable infrequent senses. We propose technique tackles these problems de-conflating based on deep knowledge it...
Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability capturing context-sensitive semantic nuances. However, there is still little knowledge about capabilities potential limitations encoding recovering word senses. In this article, we provide an in-depth quantitative qualitative analysis celebrated model with...
The evaluation of several tasks in lexical semantics is often limited by the lack large amounts manual annotations, not only for training purposes, but also testing purposes. Word Sense Disambiguation (WSD) a case point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend be performed on small scale, which does allow in-depth analysis factors that determine systems' performance. In this paper we address issue means realistic simulation...
Manually constructed taxonomies provide a crucial resource for many NLP technologies, yet these resources are often limited in their lexical coverage due to construction procedure.While multiple approaches have been proposed enrich such with new concepts, techniques typically evaluated by measuring the accuracy at identifying relationships between words, e.g., that dog is canine, rather specific concepts.Task 14 provides an evaluation framework automatic taxonomy enrichment placement of...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
This paper addresses the problem of mapping natural language text to knowledge base entities. The process is approached as a composition phrase or sentence into point in multi-dimensional entity space obtained from graph. compositional model an LSTM equipped with dynamic disambiguation mechanism on input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, prepared by collecting random walks graph enhanced textual features, which act set semantic bridges between and...
Abstract Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by lack distinction between different types toponyms , which necessitates new guidelines, consolidation detailed toponym taxonomy with implications for Named Entity Recognition (NER) beyond. To address these deficiencies, our manuscript...
Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration sense-level information into has remained understudied. By incorporating a novel disambiguation algorithm state-of-the-art classification model, we create pipeline to integrate downstream applications. We show that simple input text lead consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when fine...
José Camacho-Collados, Mohammad Taher Pilehvar, Roberto Navigli. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.