NFDI4DS | UHH-SEMS - Publication Details

Robert Östling

ORCID: 0000-0002-6027-4156

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103282437

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Speech and dialogue systems
Speech Recognition and Synthesis
Multimodal Machine Learning Applications
Semantic Web and Ontologies
Authorship Attribution and Profiling
Hearing Impairment and Communication
Hand Gesture Recognition Systems
Sentiment Analysis and Opinion Mining
Lexicography and Language Studies
Language, Metaphor, and Cognition
Algorithms and Data Compression
Child and Animal Learning Development
Swearing, Euphemism, Multilingualism
Language Development and Disorders
Mathematics, Computing, and Information Processing
Advanced Text Analysis Techniques
Language and cultural evolution
linguistics and terminology studies
Text and Document Classification Technologies
Team Dynamics and Performance
Linguistic Studies and Language Acquisition
Cognitive Science and Mapping

Stockholm University
2013-2024

Dartmouth College
2021

University of Stuttgart
2021

Uppsala University
2017-2021

University of Duisburg-Essen
2021

East Stroudsburg University
2021

Stockholm School of Economics
2018-2020

Hong Kong University of Science and Technology
2020

University of Hong Kong
2020

Carleton College
2020

Efficient Word Alignment with Markov Chain Monte Carlo

OPENALEX - Publications

Robert Östling Jörg Tiedemann

Abstract We present EFMARAL, a new system for efficient and accurate word alignment using Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures architecture we are able to surpass the fast_align system, commonly used performance-critical alignment, both in computational efficiency accuracy. Our evaluation shows that phrase-based statistical machine translation (SMT) produces translations higher quality when alignments from EFMARAL than...

10.1515/pralin-2016-0013 article EN The Prague Bulletin of Mathematical Linguistics 2016-10-01

Continuous multilinguality with language vectors

OPENALEX - Publications

Robert Östling Jörg Tiedemann

Most existing models for multilingual natural language processing (NLP) treat as a discrete category, and make predictions either one or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with character-based neural model, used to improve inference about varieties not seen during training. experiments 1303 Bible translations into 990 different languages, empirically explore capacity models, also vectors...

10.18653/v1/e17-2102 article EN cc-by 2017-01-01

What Do Language Representations Really Represent?

OPENALEX - Publications

Johannes Bjerva Robert Östling Maria Han Veiga Jörg Tiedemann Isabelle Augenstein

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with representations. If the is multilingual, same learn languages, languages We show this holds even when multilingual has been translated into English, by picking faint signal left source languages. However, just as it thorny problem separate semantic from syntactic similarity in word representations, not obvious what type captured investigate...

10.1162/coli_a_00351 article EN cc-by-nc-nd Computational Linguistics 2019-03-20

Stagger: an Open-Source Part of Speech Tagger for Swedish

OPENALEX - Publications

Robert Östling

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using SALDO morphological lexicon and semi-supervised learning in form Collobert andWeston embeddings, it reaches an accuracy 96.4% standard Stockholm-Umeå Corpus dataset, making best single tagging system reported Swedish. Accuracy increases to 96.6% latest version corpus, where annotation has been revised increase consistency. Stagger is also evaluated corpus blog posts,...

10.3384/nejlt.2000-1533.1331 article EN Northern European Journal of Language Technology 2013-09-16

Word Order Typology through Multilingual Word Alignment

OPENALEX - Publications

Robert Östling

Robert Östling. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2034 article EN cc-by 2015-01-01

Distribution and duration of signs and parts of speech in Swedish Sign Language

OPENALEX - Publications

Carl Börstell Thomas Hörberg Robert Östling

In this paper, we investigate frequency and duration of signs parts speech in Swedish Sign Language (SSL) using the SSL Corpus. The is correlated with frequency, high-frequency items having shorter than low-frequency items. Similarly, function words (e.g. pronouns) have content nouns). compounds, forms annotated as reduced display duration. Fingerspelling correlates word length corresponding words, play a role lexicalization fingerspellings. sign distribution Corpus shows great deal...

10.1075/sll.19.2.01bor article EN Sign Language & Linguistics 2016-12-31

Neural machine translation for low-resource languages

OPENALEX - Publications

Robert Östling Jörg Tiedemann

Neural machine translation (NMT) approaches have improved the state of art in many settings over last couple years, but they require large amounts training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments learn sentence reordering during translation. In addition our novel model, we also present an empirical evaluation phrase-based statistical (SMT) investigate lower limits...

10.48550/arxiv.1708.05729 preprint EN other-oa arXiv (Cornell University) 2017-01-01

The Helsinki Neural Machine Translation System

OPENALEX - Publications

Robert Östling Yves Scherrer Jörg Tiedemann Gongbo Tang Tommi Nieminen

We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in news translation task at WMT 2017, where ranked first both human automatic evaluations for English-Finnish.We discuss success of English-Finnish translations overall advantage NMT over a strong SMT baseline.We also our submissions English-Latvian, English-Chinese Chinese-English.

10.18653/v1/w17-4733 article EN cc-by 2017-01-01

Morphological reinflection with convolutional neural networks

OPENALEX - Publications

Robert Östling

We present a system for morphological reinflection based on an encoder-decoder neural network model with extra convolutional layers.In spite of its simplicity, the method performs reasonably well all languages SIGMORPHON 2016 shared task, particularly most challenging problem limited-resources (track 2, task 3).We also find that using only convolution achieves surprisingly good results in this surpassing accuracy our several languages.

10.18653/v1/w16-2003 article EN cc-by 2016-01-01

Bayesian Word Alignment for Massively Parallel Texts

OPENALEX - Publications

Robert Östling

There has been a great amount of work done in the field bitext alignment, but problem aligning words massively parallel texts with hundreds or thousands languages is largely unexplored.While basic task similar, there are also important differences purpose, method and evaluation between problems.In this work, I present nonparametric Bayesian model that can be used for simultaneous word alignment corpora.This evaluated on corpus containing 1144 translations New Testament.

10.3115/v1/e14-4024 article EN 2014-01-01

Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools

OPENALEX - Publications

Jörg Tiedemann Fabienne Cap Jenna Kanerva Filip Ginter Sara Stymne and 2 more

Jörg Tiedemann, Fabienne Cap, Jenna Kanerva, Filip Ginter, Sara Stymne, Robert Östling, Marion Weller-Di Marco. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 2016.

10.18653/v1/w16-2326 article EN cc-by 2016-01-01

Let’s be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

OPENALEX - Publications

Murathan Kurfalı Robert Östling

In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...

10.18653/v1/2021.unimplicit-1.1 article EN cc-by 2021-01-01

Probing Multilingual Language Models for Discourse

OPENALEX - Publications

Murathan Kurfalı Robert Östling

Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....

10.18653/v1/2021.repl4nlp-1.2 article EN cc-by 2021-01-01

Language Embeddings Sometimes Contain Typological Generalizations

OPENALEX - Publications

Robert Östling Murathan Kurfalı

Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.1162/coli_a_00491 article EN cc-by-nc-nd Computational Linguistics 2023-01-01

Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for

OPENALEX - Publications

Johan Sjons Thomas Hörberg Robert Östling Johannes Bjerva

In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age child, even when utterance length and differences between subjects are controlled for.In this paper show on level spontaneous i) for youngest children, CDS is lower than adult-directed (ADS), ii) there significant negative correlation surprisal (the log probability) ADS, iii) increase child holds, along with speakers for.These results indicate adults adjust their to...

10.21437/interspeech.2017-1052 article EN Interspeech 2022 2017-08-16

Coming Soon ...