- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Speech and dialogue systems
- Semantic Web and Ontologies
- Wikis in Education and Collaboration
- Artificial Intelligence in Law
- Multimodal Machine Learning Applications
- Law, AI, and Intellectual Property
- Comparative and International Law Studies
- Information Retrieval and Search Behavior
- 3D Surveying and Cultural Heritage
- Urban Planning and Valuation
- Law in Society and Culture
- Advanced Vision and Imaging
- Video Analysis and Summarization
- Artificial Intelligence in Games
- Web Data Mining and Analysis
- Ethics and Social Impacts of AI
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
American Jewish Committee
2023
Sapienza University of Rome
2014-2022
University of Copenhagen
2021-2022
Laboratoire d'Informatique de Paris-Nord
2022
University of Southern California
2020
IIT@MIT
2018
Bar-Ilan University
2018
Contextual representations of words derived by neural language models have proven to effectively encode the subtle distinctions that might occur between different meanings same word. However, these are not tied a semantic network, hence they leave word implicit and thereby neglect information can be from knowledge base itself. In this paper, we propose SensEmBERT, knowledge-based approach brings together expressive power modelling vast amount contained in network produce high-quality latent...
Word Sense Disambiguation (WSD) aims at making explicit the semantics of a word in context by identifying most suitable meaning from predefined sense inventory. Recent breakthroughs representation learning have fueled intensive WSD research, resulting considerable performance improvements, breaching 80% glass ceiling set inter-annotator agreement. In this survey, we provide an extensive overview current advances WSD, describing state art terms i) resources for task, i.e., inventories and...
Contextualized word embeddings have been employed effectively across several tasks in Natural Language Processing, as they proved to carry useful semantic information. However, it is still hard link them structured sources of knowledge. In this paper we present ARES (context-AwaRe Embeddings Senses), a semi-supervised approach producing sense for the lexical meanings within knowledge base that lie space comparable contextualized vectors. representations enable simple 1 Nearest-Neighbour...
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, Horacio Saggion. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018.
Word Sense Disambiguation (WSD) is a historical NLP task aimed at linking words in contexts to discrete sense inventories and it usually cast as multi-label classification task. Recently, several neural approaches have employed definitions better represent word meanings. Yet, these do not observe the input sentence definition candidates all once, thus potentially reducing model performance generalization power. We cope with this issue by reframing WSD span extraction problem — which we...
We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, integrated taxonomy Wikipage pages and categories.We leverage information available in either one taxonomies reinforce other taxonomy.Our experiments show higher quality coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet WikiTaxonomy.WiBi is at http://wibitaxonomy.org.
Annotating large numbers of sentences with senses is the heaviest requirement current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions sense-annotated training instances virtually all meanings words in language's vocabulary. The approach fully automatic: no human intervention required and only type knowledge used WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets...
The ability to correctly model distinct meanings of a word is crucial for the effectiveness semantic representation techniques. However, most existing evaluation benchmarks assessing this criterion are tied sense inventories (usually WordNet), restricting their usage small subset knowledge-based Word-in-Context dataset (WiC) addresses dependence on by reformulating standard disambiguation task as binary classification problem; but, it limited English language. We put forward large...
Transformer-based architectures brought a breeze of change to Word Sense Disambiguation (WSD), improving models' performances by large margin. The fast development new approaches has been further encouraged well-framed evaluation suite for English, which allowed their be kept track and compared fairly. However, other languages have remained largely unexplored, as testing data are available few only the setting is rather matted. In this paper, we untangle situation proposing XL-WSD,...
Word Sense Disambiguation (WSD) is the task of identifying meaning a word in given context. It lies at base Natural Language Processing as it provides semantic information for words. In last decade, great strides have been made this field and much effort has devoted to mitigate knowledge acquisition bottleneck problem, i.e., problem semantically annotating texts large scale different languages. This issue ubiquitous WSD hinders creation both multilingual bases manually-curated training sets....
The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting power supervised systems when applied to Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and information enclosed in base, projects sense labels from high-resource language, i.e., English, lower-resourced ones. Backed several experiments,...
Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Schwemer, Anders Søgaard. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
The well-known problem of knowledge acquisition is one the biggest issues in Word Sense Disambiguation (WSD), where annotated data are still scarce English and almost absent other languages. In this paper we formulate assumption One per Wikipedia Category present OneSeC, a language-independent method for automatic extraction hundreds thousands sentences which target word tagged with its meaning. Our automatically-generated consistently lead supervised WSD model to state-of-the-art...
Word Sense Disambiguation (WSD) is the task of associating correct meaning with a word in given context. WSD provides explicit semantic information that beneficial to several downstream applications, such as question answering, parsing and hypernym extraction. Unfortunately, suffers from well-known knowledge acquisition bottleneck problem: it very expensive, terms both time money, acquire annotations for large number sentences. To address this blocking issue we present Train-O-Matic,...
Word Sense Disambiguation (WSD) is the task of associating a word in context with one its meanings. While many works past have focused on raising state art, none has even come close to achieving an F-score 80% ballpark when using WordNet as sense inventory. We contend that main reasons for this failure excessively fine granularity inventory, resulting senses are hard differentiate between, experienced human annotator. In paper we cope long-standing problem by introducing Coarse Inventory...
Knowing the correct distribution of senses within a corpus can potentially boost performance Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing given raw sentences. Intrinsic extrinsic evaluations show that our outperform current state art in sense learning strongest baselines most frequent multiple languages on domain-specific test sets. Our distributions are available at http://trainomatic.org.
Knowing the Most Frequent Sense (MFS) of a word has been proved to help Word Disambiguation (WSD) models significantly. However, scarcity sense-annotated data makes it difficult induce reliable and high-coverage distribution meanings in language vocabulary. To address this issue, paper we present CluBERT, an automatic multilingual approach for inducing distributions senses from corpus raw sentences. Our experiments show that CluBERT learns over English are higher quality than those extracted...
In this paper we examine how human-machine interaction in the legal sector is suggested to be regulated EU’s recently proposed Artificial Intelligence Act. First, provide a brief background and overview of proposal. Then turn towards assessment high-risk AI systems for tasks as well obligations such terms interaction. We argue that whereas definition system broad, concrete area ‘administration justice democratic processes’, despite coming with considerable uncertainty, narrow unlikely extent...
Iacer Calixto, Alessandro Raganato, Tommaso Pasini. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more along with definitions, has not yet investigated. In this work, we introduce novel task Exemplification Modeling (ExMod), sequence-to-sequence architecture and training procedure for it. Starting from set (word, definition) pairs, our approach is capable automatically high-quality sentences which express...