- Topic Modeling
- Natural Language Processing Techniques
- Text Readability and Simplification
- Semantic Web and Ontologies
- Multimodal Machine Learning Applications
- Wikis in Education and Collaboration
- Information Retrieval and Search Behavior
- Speech and dialogue systems
Sapienza University of Rome
2020-2023
Multilingual Named Entity Recognition (NER) is a key intermediate task which needed in many areas of NLP. In this paper, we address the well-known issue data scarcity NER, especially relevant when moving to multilingual scenario, and go beyond current approaches creation silver for task. We exploit texts Wikipedia introduce new methodology based on effective combination knowledge-based neural models, together with novel domain adaptation technique, produce high-quality training corpora NER....
The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting power supervised systems when applied to Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and information enclosed in base, projects sense labels from high-resource language, i.e., English, lower-resourced ones. Backed several experiments,...
Lexical ambiguity poses one of the greatest challenges in field Machine Translation. Over last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by polysemous nature words. Within this body research, some studies posited that models pick up semantic biases existing training data, thus producing translation errors. In paper, we present DiBiMT, first entirely manually-curated evaluation benchmark which enables an extensive study Translation nominal...
With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist few keywords only, which increases ambiguity and makes their contextualization harder, ii) performing on non-English documents is still cumbersome due to shortage labeled datasets. In this paper we present SIR (Sense-enhanced Retrieval) mitigate both problems by leveraging...
Niccolò Campolungo, Tommaso Pasini, Denis Emelin, Roberto Navigli. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.
Abstract Despite the remarkable progress made in field of Machine Translation (MT), current systems still struggle when translating ambiguous words, especially these express infrequent meanings. In order to investigate and analyze impact lexical ambiguity on automatic translations, several tasks evaluation benchmarks have been proposed over course last few years. However, works this research direction suffer from critical shortcomings. Indeed, existing datasets are not entirely manually...
Over the last few years, Masked Language Modeling (MLM) pre-training has resulted in remarkable advancements many Natural Understanding (NLU) tasks, which sparked an interest researching alternatives and extensions to MLM objective. In this paper, we tackle absence of explicit semantic grounding propose Descriptive (DMLM), a knowledge-enhanced reading comprehension objective, where model is required predict most likely word context, being provided with word’s definition. For instance, given...