- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Multimodal Machine Learning Applications
- Semantic Web and Ontologies
- Authorship Attribution and Profiling
- Hearing Impairment and Communication
- Hand Gesture Recognition Systems
- Sentiment Analysis and Opinion Mining
- Lexicography and Language Studies
- Language, Metaphor, and Cognition
- Algorithms and Data Compression
- Child and Animal Learning Development
- Swearing, Euphemism, Multilingualism
- Language Development and Disorders
- Mathematics, Computing, and Information Processing
- Advanced Text Analysis Techniques
- Language and cultural evolution
- linguistics and terminology studies
- Text and Document Classification Technologies
- Team Dynamics and Performance
- Linguistic Studies and Language Acquisition
- Cognitive Science and Mapping
Stockholm University
2013-2024
Dartmouth College
2021
University of Stuttgart
2021
Uppsala University
2017-2021
University of Duisburg-Essen
2021
East Stroudsburg University
2021
Stockholm School of Economics
2018-2020
Hong Kong University of Science and Technology
2020
University of Hong Kong
2020
Carleton College
2020
Abstract We present EFMARAL, a new system for efficient and accurate word alignment using Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures architecture we are able to surpass the fast_align system, commonly used performance-critical alignment, both in computational efficiency accuracy. Our evaluation shows that phrase-based statistical machine translation (SMT) produces translations higher quality when alignments from EFMARAL than...
Most existing models for multilingual natural language processing (NLP) treat as a discrete category, and make predictions either one or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with character-based neural model, used to improve inference about varieties not seen during training. experiments 1303 Bible translations into 990 different languages, empirically explore capacity models, also vectors...
A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with representations. If the is multilingual, same learn languages, languages We show this holds even when multilingual has been translated into English, by picking faint signal left source languages. However, just as it thorny problem separate semantic from syntactic similarity in word representations, not obvious what type captured investigate...
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using SALDO morphological lexicon and semi-supervised learning in form Collobert andWeston embeddings, it reaches an accuracy 96.4% standard Stockholm-Umeå Corpus dataset, making best single tagging system reported Swedish. Accuracy increases to 96.6% latest version corpus, where annotation has been revised increase consistency. Stagger is also evaluated corpus blog posts,...
Robert Östling. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
In this paper, we investigate frequency and duration of signs parts speech in Swedish Sign Language (SSL) using the SSL Corpus. The is correlated with frequency, high-frequency items having shorter than low-frequency items. Similarly, function words (e.g. pronouns) have content nouns). compounds, forms annotated as reduced display duration. Fingerspelling correlates word length corresponding words, play a role lexicalization fingerspellings. sign distribution Corpus shows great deal...
Neural machine translation (NMT) approaches have improved the state of art in many settings over last couple years, but they require large amounts training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments learn sentence reordering during translation. In addition our novel model, we also present an empirical evaluation phrase-based statistical (SMT) investigate lower limits...
We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in news translation task at WMT 2017, where ranked first both human automatic evaluations for English-Finnish.We discuss success of English-Finnish translations overall advantage NMT over a strong SMT baseline.We also our submissions English-Latvian, English-Chinese Chinese-English.
We present a system for morphological reinflection based on an encoder-decoder neural network model with extra convolutional layers.In spite of its simplicity, the method performs reasonably well all languages SIGMORPHON 2016 shared task, particularly most challenging problem limited-resources (track 2, task 3).We also find that using only convolution achieves surprisingly good results in this surpassing accuracy our several languages.
There has been a great amount of work done in the field bitext alignment, but problem aligning words massively parallel texts with hundreds or thousands languages is largely unexplored.While basic task similar, there are also important differences purpose, method and evaluation between problems.In this work, I present nonparametric Bayesian model that can be used for simultaneous word alignment corpora.This evaluated on corpus containing 1144 translations New Testament.
Jörg Tiedemann, Fabienne Cap, Jenna Kanerva, Filip Ginter, Sara Stymne, Robert Östling, Marion Weller-Di Marco. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 2016.
In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...
Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....
Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...
In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age child, even when utterance length and differences between subjects are controlled for.In this paper show on level spontaneous i) for youngest children, CDS is lower than adult-directed (ADS), ii) there significant negative correlation surprisal (the log probability) ADS, iii) increase child holds, along with speakers for.These results indicate adults adjust their to...