Robert Östling

ORCID: 0000-0002-6027-4156
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Semantic Web and Ontologies
  • Authorship Attribution and Profiling
  • Hearing Impairment and Communication
  • Hand Gesture Recognition Systems
  • Sentiment Analysis and Opinion Mining
  • Lexicography and Language Studies
  • Language, Metaphor, and Cognition
  • Algorithms and Data Compression
  • Child and Animal Learning Development
  • Swearing, Euphemism, Multilingualism
  • Language Development and Disorders
  • Mathematics, Computing, and Information Processing
  • Advanced Text Analysis Techniques
  • Language and cultural evolution
  • linguistics and terminology studies
  • Text and Document Classification Technologies
  • Team Dynamics and Performance
  • Linguistic Studies and Language Acquisition
  • Cognitive Science and Mapping

Stockholm University
2013-2024

Dartmouth College
2021

University of Stuttgart
2021

Uppsala University
2017-2021

University of Duisburg-Essen
2021

East Stroudsburg University
2021

Stockholm School of Economics
2018-2020

Hong Kong University of Science and Technology
2020

University of Hong Kong
2020

Carleton College
2020

Abstract We present EFMARAL, a new system for efficient and accurate word alignment using Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures architecture we are able to surpass the fast_align system, commonly used performance-critical alignment, both in computational efficiency accuracy. Our evaluation shows that phrase-based statistical machine translation (SMT) produces translations higher quality when alignments from EFMARAL than...

10.1515/pralin-2016-0013 article EN ˜The œPrague Bulletin of Mathematical Linguistics 2016-10-01

Most existing models for multilingual natural language processing (NLP) treat as a discrete category, and make predictions either one or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with character-based neural model, used to improve inference about varieties not seen during training. experiments 1303 Bible translations into 990 different languages, empirically explore capacity models, also vectors...

10.18653/v1/e17-2102 article EN cc-by 2017-01-01

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with representations. If the is multilingual, same learn languages, languages We show this holds even when multilingual has been translated into English, by picking faint signal left source languages. However, just as it thorny problem separate semantic from syntactic similarity in word representations, not obvious what type captured investigate...

10.1162/coli_a_00351 article EN cc-by-nc-nd Computational Linguistics 2019-03-20

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using SALDO morphological lexicon and semi-supervised learning in form Collobert andWeston embeddings, it reaches an accuracy 96.4% standard Stockholm-Umeå Corpus dataset, making best single tagging system reported Swedish. Accuracy increases to 96.6% latest version corpus, where annotation has been revised increase consistency. Stagger is also evaluated corpus blog posts,...

10.3384/nejlt.2000-1533.1331 article EN Northern European Journal of Language Technology 2013-09-16

Robert Östling. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2034 article EN cc-by 2015-01-01

In this paper, we investigate frequency and duration of signs parts speech in Swedish Sign Language (SSL) using the SSL Corpus. The is correlated with frequency, high-frequency items having shorter than low-frequency items. Similarly, function words (e.g. pronouns) have content nouns). compounds, forms annotated as reduced display duration. Fingerspelling correlates word length corresponding words, play a role lexicalization fingerspellings. sign distribution Corpus shows great deal...

10.1075/sll.19.2.01bor article EN Sign Language & Linguistics 2016-12-31

Neural machine translation (NMT) approaches have improved the state of art in many settings over last couple years, but they require large amounts training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments learn sentence reordering during translation. In addition our novel model, we also present an empirical evaluation phrase-based statistical (SMT) investigate lower limits...

10.48550/arxiv.1708.05729 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in news translation task at WMT 2017, where ranked first both human automatic evaluations for English-Finnish.We discuss success of English-Finnish translations overall advantage NMT over a strong SMT baseline.We also our submissions English-Latvian, English-Chinese Chinese-English.

10.18653/v1/w17-4733 article EN cc-by 2017-01-01

We present a system for morphological reinflection based on an encoder-decoder neural network model with extra convolutional layers.In spite of its simplicity, the method performs reasonably well all languages SIGMORPHON 2016 shared task, particularly most challenging problem limited-resources (track 2, task 3).We also find that using only convolution achieves surprisingly good results in this surpassing accuracy our several languages.

10.18653/v1/w16-2003 article EN cc-by 2016-01-01

There has been a great amount of work done in the field bitext alignment, but problem aligning words massively parallel texts with hundreds or thousands languages is largely unexplored.While basic task similar, there are also important differences purpose, method and evaluation between problems.In this work, I present nonparametric Bayesian model that can be used for simultaneous word alignment corpora.This evaluated on corpus containing 1144 translations New Testament.

10.3115/v1/e14-4024 article EN 2014-01-01

Jörg Tiedemann, Fabienne Cap, Jenna Kanerva, Filip Ginter, Sara Stymne, Robert Östling, Marion Weller-Di Marco. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 2016.

10.18653/v1/w16-2326 article EN cc-by 2016-01-01

In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...

10.18653/v1/2021.unimplicit-1.1 article EN cc-by 2021-01-01

Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....

10.18653/v1/2021.repl4nlp-1.2 article EN cc-by 2021-01-01

Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.1162/coli_a_00491 article EN cc-by-nc-nd Computational Linguistics 2023-01-01

In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age child, even when utterance length and differences between subjects are controlled for.In this paper show on level spontaneous i) for youngest children, CDS is lower than adult-directed (ADS), ii) there significant negative correlation surprisal (the log probability) ADS, iii) increase child holds, along with speakers for.These results indicate adults adjust their to...

10.21437/interspeech.2017-1052 article EN Interspeech 2022 2017-08-16
Coming Soon ...