Olga Majewska

ORCID: 0000-0003-4509-8817
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications
  • Biomedical Text Mining and Ontologies
  • Machine Learning in Bioinformatics
  • Text Readability and Simplification
  • Explainable Artificial Intelligence (XAI)
  • Anomaly Detection Techniques and Applications
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Data Quality and Management
  • Text and Document Classification Technologies
  • Authorship Attribution and Profiling
  • Advanced Text Analysis Techniques
  • Image Retrieval and Classification Techniques
  • Neurobiology of Language and Bilingualism

University of Cambridge
2017-2023

Center for Applied Linguistics
2020

In order to simulate human language capacity, natural processing systems must be able reason about the dynamics of everyday situations, including their possible causes and effects. Moreover, they should generalise acquired world knowledge new languages, modulo cultural differences. Advances in machine reasoning cross-lingual transfer depend on availability challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice Plausible Alternatives (XCOPA), a...

10.18653/v1/2020.emnlp-main.185 article EN 2020-01-01

We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well less-resourced ones Welsh, Kiswahili). Each language set is annotated the relation of semantic similarity contains 1,888 semantically aligned concept pairs, providing representative coverage word classes (nouns, verbs, adjectives, adverbs), frequency ranks, intervals, fields,...

10.1162/coli_a_00391 article EN cc-by-nc-nd Computational Linguistics 2020-10-22

Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš. Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Architectures. 2020.

10.18653/v1/2020.deelio-1.5 article EN cc-by 2020-01-01

Abstract Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, its potential is not fully realized, as current multilingual ToD datasets—both modular end-to-end modeling—suffer from severe limitations. 1) When created scratch, they are usually small in scale fail cover possible flows. 2) Translation-based datasets might lack naturalness cultural specificity the target language. In this work, tackle these...

10.1162/tacl_a_00539 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

In task-oriented dialogue (ToD), a user holds conversation with an artificial agent the aim of completing concrete task. Although this technology represents one central objectives AI and has been focus ever more intense research development efforts, it is currently limited to few narrow domains (e.g., food ordering, ticket booking) handful languages English, Chinese). This work provides extensive overview existing methods resources in multilingual ToD as entry point exciting emerging field....

10.1613/jair.1.13083 article EN cc-by Journal of Artificial Intelligence Research 2022-07-13

Olga Majewska, Ivan Vulić, Goran Glavaš, Edoardo Maria Ponti, Anna Korhonen. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.541 article EN cc-by 2021-01-01

Abstract Background Recent advances in representation learning have enabled large strides natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge been shown to boost model performance different Natural Language Processing (NLP) tasks where accurate handling verb meaning and behaviour is critical. The costliness time required manual lexicon construction has major obstacle...

10.1186/s13326-021-00247-z article EN cc-by Journal of Biomedical Semantics 2021-07-15

VerbNet, an extensive computational verb lexicon for English, has proved useful supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning verbs. Biomedical text processing mining could benefit from similar resource. We take first step towards development BioVerbNet: A VerbNet specifically aimed at describing verbs in area biomedicine. Because VerbNet-style classification is extremely time consuming, we start small manual biomedical...

10.1186/s13326-018-0193-x article EN cc-by Journal of Biomedical Semantics 2019-01-18

Abstract Research into representation learning models of lexical semantics usually utilizes some form intrinsic evaluation to ensure that the learned representations reflect human semantic judgments. Lexical similarity estimation is a widely used method, but efforts have typically focused on pairwise judgments words in isolation, or are limited specific contexts and stimuli. There limitations with these approaches either do not provide any context for judgments, thereby ignore ambiguity,...

10.1162/coli_a_00396 article EN cc-by-nc-nd Computational Linguistics 2021-03-01

Recent advances in deep learning have also enabled fast progress the research of task-oriented dialogue (ToD) systems. However, majority ToD systems are developed for English and merely a handful other widely spoken languages, e.g., Chinese German. This hugely limits global reach and, consequently, transformative socioeconomic potential such In this tutorial, we will thus discuss demonstrate importance (building) multilingual systems, then provide systematic overview current gaps, challenges...

10.18653/v1/2022.acl-tutorials.8 article EN cc-by 2022-01-01

VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation multilingual has been limited by the fact that such classifications are few languages only. Since manual development VerbNet is major undertaking, researchers have recently translated classes from English to other languages. no systematic investigation conducted into applicability and accuracy translation approach across different,...

10.1007/s10579-017-9403-x article EN cc-by Language Resources and Evaluation 2017-10-20

Recent developments in language modeling have enabled large text encoders to derive a wealth of linguistic information from raw corpora without supervision. Their success across natural processing (NLP) tasks has called into question the role man-made computational resources, such as verb lexicons, supporting modern NLP. Still, probing analyses concurrently exposed limitations knowledge possessed by neural architectures, revealing them be clever task solvers rather than self-taught...

10.1146/annurev-linguistics-030521-043632 article EN cc-by Annual Review of Linguistics 2022-10-21

We present the first evaluation of applicability a spatial arrangement method (SpAM) to typologically diverse language sample, and its potential produce semantic resources support multilingual NLP, with focus on verb semantics. demonstrate SpAM’s utility in allowing for quick bottom-up creation large-scale datasets that balance cross-lingual alignment specificity. Starting from shared sample 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, Italian, we apply two-phase...

10.18653/v1/2020.coling-main.423 article EN cc-by Proceedings of the 17th international conference on Computational linguistics - 2020-01-01

Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, the potential of this technology is not fully realised, as current datasets multilingual ToD - both modular end-to-end modelling suffer from severe limitations. 1) When created scratch, they are usually small in scale fail cover possible flows. 2) Translation-based might lack naturalness cultural specificity target language. In work, tackle these...

10.48550/arxiv.2201.13405 preprint EN cc-by arXiv (Cornell University) 2022-01-01

In task-oriented dialogue (ToD), a user holds conversation with an artificial agent to complete concrete task. Although this technology represents one of the central objectives AI and has been focus ever more intense research development efforts, it is currently limited few narrow domains (e.g., food ordering, ticket booking) handful languages English, Chinese). This work provides extensive overview existing methods resources in multilingual ToD as entry point exciting emerging field. We...

10.48550/arxiv.2104.08570 preprint EN cc-by arXiv (Cornell University) 2021-01-01

In parallel to their overwhelming success across NLP tasks, language ability of deep Transformer networks, pretrained via modeling (LM) objectives has undergone extensive scrutiny. While probing revealed that these models encode a range syntactic and semantic properties language, they are still prone fall back on superficial cues simple heuristics solve downstream rather than leverage deeper linguistic knowledge. this paper, we target one such area deficiency, verbal reasoning. We...

10.48550/arxiv.2012.15421 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...