NFDI4DS | UHH-SEMS - Publication Details

Olga Majewska

ORCID: 0000-0003-4509-8817

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5058032142

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech and dialogue systems
Multimodal Machine Learning Applications
Biomedical Text Mining and Ontologies
Machine Learning in Bioinformatics
Text Readability and Simplification
Explainable Artificial Intelligence (XAI)
Anomaly Detection Techniques and Applications
Data Management and Algorithms
Data Mining Algorithms and Applications
Data Quality and Management
Text and Document Classification Technologies
Authorship Attribution and Profiling
Advanced Text Analysis Techniques
Image Retrieval and Classification Techniques
Neurobiology of Language and Bilingualism

University of Cambridge
2017-2023

Center for Applied Linguistics
2020

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

OPENALEX - Publications

Edoardo Maria Ponti Goran Glavaš Olga Majewska Qianchu Liu Ivan Vulić and 1 more

In order to simulate human language capacity, natural processing systems must be able reason about the dynamics of everyday situations, including their possible causes and effects. Moreover, they should generalise acquired world knowledge new languages, modulo cultural differences. Advances in machine reasoning cross-lingual transfer depend on availability challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice Plausible Alternatives (XCOPA), a...

10.18653/v1/2020.emnlp-main.185 article EN 2020-01-01

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

OPENALEX - Publications

Ivan Vulić Simon Baker Edoardo Maria Ponti Ulla Petti Ira Leviant and 7 more

We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well less-resourced ones Welsh, Kiswahili). Each language set is annotated the relation of semantic similarity contains 1,888 semantically aligned concept pairs, providing representative coverage word classes (nouns, verbs, adjectives, adverbs), frequency ranks, intervals, fields,...

10.1162/coli_a_00391 article EN cc-by-nc-nd Computational Linguistics 2020-10-22

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

OPENALEX - Publications

Anne Lauscher Olga Majewska Leonardo F. R. Ribeiro Iryna Gurevych Nikolai Rozanov and 1 more

Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš. Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Architectures. 2020.

10.18653/v1/2020.deelio-1.5 article EN cc-by 2020-01-01

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

OPENALEX - Publications

Olga Majewska Evgeniia Razumovskaia Edoardo Maria Ponti Ivan Vulić Anna Korhonen

Abstract Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, its potential is not fully realized, as current multilingual ToD datasets—both modular end-to-end modeling—suffer from severe limitations. 1) When created scratch, they are usually small in scale fail cover possible flows. 2) Translation-based datasets might lack naturalness cultural specificity the target language. In this work, tackle these...

10.1162/tacl_a_00539 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

OPENALEX - Publications

Evgeniia Razumovskaia Goran Glavaš Olga Majewska Edoardo Maria Ponti Anna Korhonen and 1 more

In task-oriented dialogue (ToD), a user holds conversation with an artificial agent the aim of completing concrete task. Although this technology represents one central objectives AI and has been focus ever more intense research development efforts, it is currently limited to few narrow domains (e.g., food ordering, ticket booking) handful languages English, Chinese). This work provides extensive overview existing methods resources in multilingual ToD as entry point exciting emerging field....

10.1613/jair.1.13083 article EN cc-by Journal of Artificial Intelligence Research 2022-07-13

Verb Knowledge Injection for Multilingual Event Processing

OPENALEX - Publications

Olga Majewska Ivan Vulić Goran Glavaš Edoardo Maria Ponti Anna Korhonen

Olga Majewska, Ivan Vulić, Goran Glavaš, Edoardo Maria Ponti, Anna Korhonen. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.541 article EN cc-by 2021-01-01

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

OPENALEX - Publications

Olga Majewska Charlotte Collins Simon Baker Jari Björne Susan Windisch Brown and 2 more

Abstract Background Recent advances in representation learning have enabled large strides natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge been shown to boost model performance different Natural Language Processing (NLP) tasks where accurate handling verb meaning and behaviour is critical. The costliness time required manual lexicon construction has major obstacle...

10.1186/s13326-021-00247-z article EN cc-by Journal of Biomedical Semantics 2021-07-15

A neural classification method for supporting the creation of BioVerbNet

OPENALEX - Publications

Billy Chiu Olga Majewska Sampo Pyysalo Laura T. Wey Ulla Stenius and 2 more

VerbNet, an extensive computational verb lexicon for English, has proved useful supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning verbs. Biomedical text processing mining could benefit from similar resource. We take first step towards development BioVerbNet: A VerbNet specifically aimed at describing verbs in area biomedicine. Because VerbNet-style classification is extremely time consuming, we start small manual biomedical...

10.1186/s13326-018-0193-x article EN cc-by Journal of Biomedical Semantics 2019-01-18

Semantic Data Set Construction from Human Clustering and Spatial Arrangement

OPENALEX - Publications

Olga Majewska Diana McCarthy Jasper J.F. van den Bosch Nikolaus Kriegeskorte Ivan Vulić and 1 more

Abstract Research into representation learning models of lexical semantics usually utilizes some form intrinsic evaluation to ensure that the learned representations reflect human semantic judgments. Lexical similarity estimation is a widely used method, but efforts have typically focused on pairwise judgments words in isolation, or are limited specific contexts and stimuli. There limitations with these approaches either do not provide any context for judgments, thereby ignore ambiguity,...

10.1162/coli_a_00396 article EN cc-by-nc-nd Computational Linguistics 2021-03-01

Natural Language Processing for Multilingual Task-Oriented Dialogue

OPENALEX - Publications

Evgeniia Razumovskaia Goran Glavaš Olga Majewska Edoardo Maria Ponti Ivan Vulić

Recent advances in deep learning have also enabled fast progress the research of task-oriented dialogue (ToD) systems. However, majority ToD systems are developed for English and merely a handful other widely spoken languages, e.g., Chinese German. This hugely limits global reach and, consequently, transformative socioeconomic potential such In this tutorial, we will thus discuss demonstrate importance (building) multilingual systems, then provide systematic overview current gaps, challenges...

10.18653/v1/2022.acl-tutorials.8 article EN cc-by 2022-01-01

Investigating the cross-lingual translatability of VerbNet-style classification

OPENALEX - Publications

Olga Majewska Ivan Vulić Diana McCarthy Yan Huang Akira Murakami and 2 more

VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation multilingual has been limited by the fact that such classifications are few languages only. Since manual development VerbNet is major undertaking, researchers have recently translated classes from English to other languages. no systematic investigation conducted into applicability and accuracy translation approach across different,...

10.1007/s10579-017-9403-x article EN cc-by Language Resources and Evaluation 2017-10-20

Acquiring verb classes through bottom-up semantic verb clustering

OPENALEX - Publications

Olga Majewska Diana McCarthy Ivan Vulić Anna Korhonen

10.17863/cam.26535 article EN Language Resources and Evaluation 2018-05-01

Verb Classification Across Languages

OPENALEX - Publications

Olga Majewska Anna Korhonen

Recent developments in language modeling have enabled large text encoders to derive a wealth of linguistic information from raw corpora without supervision. Their success across natural processing (NLP) tasks has called into question the role man-made computational resources, such as verb lexicons, supporting modern NLP. Still, probing analyses concurrently exposed limitations knowledge possessed by neural architectures, revealing them be clever task solvers rather than self-taught...

10.1146/annurev-linguistics-030521-043632 article EN cc-by Annual Review of Linguistics 2022-10-21

Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis

OPENALEX - Publications

Olga Majewska Ivan Vulić Diana McCarthy Anna Korhonen

We present the first evaluation of applicability a spatial arrangement method (SpAM) to typologically diverse language sample, and its potential produce semantic resources support multilingual NLP, with focus on verb semantics. demonstrate SpAM’s utility in allowing for quick bottom-up creation large-scale datasets that balance cross-lingual alignment specificity. Starting from shared sample 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, Italian, we apply two-phase...

10.18653/v1/2020.coling-main.423 article EN cc-by Proceedings of the 17th international conference on Computational linguistics - 2020-01-01

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

OPENALEX - Publications

Olga Majewska Evgeniia Razumovskaia Edoardo Maria Ponti Ivan Vulić Anna Korhonen

Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, the potential of this technology is not fully realised, as current datasets multilingual ToD - both modular end-to-end modelling suffer from severe limitations. 1) When created scratch, they are usually small in scale fail cover possible flows. 2) Translation-based might lack naturalness cultural specificity target language. In work, tackle these...

10.48550/arxiv.2201.13405 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

OPENALEX - Publications

Evgeniia Razumovskaia Goran Glavaš Olga Majewska Edoardo Maria Ponti Anna Korhonen and 1 more

In task-oriented dialogue (ToD), a user holds conversation with an artificial agent to complete concrete task. Although this technology represents one of the central objectives AI and has been focus ever more intense research development efforts, it is currently limited few narrow domains (e.g., food ordering, ticket booking) handful languages English, Chinese). This work provides extensive overview existing methods resources in multilingual ToD as entry point exciting emerging field. We...

10.48550/arxiv.2104.08570 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Verb Knowledge Injection for Multilingual Event Processing

OPENALEX - Publications

Olga Majewska Ivan Vulić Goran Glavaš Edoardo Maria Ponti Anna Korhonen

In parallel to their overwhelming success across NLP tasks, language ability of deep Transformer networks, pretrained via modeling (LM) objectives has undergone extensive scrutiny. While probing revealed that these models encode a range syntactic and semantic properties language, they are still prone fall back on superficial cues simple heuristics solve downstream rather than leverage deeper linguistic knowledge. this paper, we target one such area deficiency, verbal reasoning. We...

10.48550/arxiv.2012.15421 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...