- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Multimodal Machine Learning Applications
- Speech Recognition and Synthesis
- COVID-19 and Mental Health
- Sentiment Analysis and Opinion Mining
- Behavioral Health and Interventions
- Speech and dialogue systems
- Mental Health Research Topics
- Semantic Web and Ontologies
- Social and Intergroup Psychology
- Discourse Analysis in Language Studies
- Advanced Text Analysis Techniques
- Misinformation and Its Impacts
- Child and Adolescent Psychosocial and Emotional Development
- Advanced Chemical Sensor Technologies
- COVID-19 diagnosis using AI
- Text and Document Classification Technologies
- Language Development and Disorders
- Wine Industry and Tourism
- Customer Service Quality and Loyalty
- Language, Metaphor, and Cognition
- Multi-Agent Systems and Negotiation
- Energy Efficient Wireless Sensor Networks
RISE Research Institutes of Sweden
2024-2025
Stockholm University
2019-2024
Dartmouth College
2021
University of Stuttgart
2021
Uppsala University
2021
University of Duisburg-Essen
2021
East Stroudsburg University
2021
Middle East Technical University
2016-2020
Åbo Akademi University
2020
Piaggio (Italy)
2020
Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets that it covers several underrepresented languages, which we argue should be included resources used to train models Natural Language Processing tasks which, as itself, have implications Learner...
In this paper, we investigate the effects of using subword information in representation learning. We argue that syntactic units quality word representations positively. introduce a morpheme-based model and compare it against to word-based, character-based, character n-gram level models. Our takes list candidate segmentations learns based on different are weighted by an attention mechanism. performed experiments Turkish as morphologically rich language English with comparably poorer...
Abstract Naming common odors is a surprisingly difficult task: Odors are frequently misnamed. Little known about the linguistic properties of odor misnamings. We test whether misnamings old adults carry information olfactory perception and its connection to lexical‐semantic processing. analyze olfactory–semantic content source naming failures in large sample older Sweden ( n = 2479; age 58–100 years). investigate factors semantic proximity target name predict how misnamed, these relate...
This paper presents the recent developments on Turkish Discourse Bank (TDB). First, resource is summarized and an evaluation presented. Then, TDB 1.1, i.e. enrichments 10% of corpus are described (namely, senses for explicit discourse connectives, new annotations three relation types - implicit relations, entity relations alternative lexicalizations). The method annotation explained data evaluated.
In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...
Abstract In response to the COVID-19 pandemic, Psychological Science Accelerator coordinated three large-scale psychological studies examine effects of loss-gain framing, cognitive reappraisals, and autonomy framing manipulations on behavioral intentions affective measures. The data collected (April October 2020) included specific measures for each experimental study, a general questionnaire examining health prevention behaviors experience, geographical cultural context characterization,...
Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....
Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...
Automatically classifying the relation between sentences in a discourse is challenging task, particular when there no overt expression of relation. It becomes even more by fact that annotated training data exists only for small number languages, such as English and Chinese. We present new system using zero-shot transfer learning implicit classification, where resource used target language unannotated parallel text. This evaluated on discourse-annotated TED-MDB corpus, it obtains good results...
We present a very simple method for parallel text cleaning of low-resource languages, based on projection word embeddings trained large monolingual corpora in high-resource languages. In spite its simplicity, we approach the strong baseline system downstream machine translation evaluation.
To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. existing resources are overwhelmingly monolingual, compelling researchers to infer discourse-level information target languages through error-prone automatic means. current paper aims provide a more direct insight into variations structures by linking annotated relations TED-Multilingual Discourse Bank, which consists independently six TED talks seven different...
In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, proposed approach sidesteps the computational demands of current approaches that rely deep neural networks. Considering its simplicity, our achieves competitive results while offering significant gains in terms time even CPU. Furthermore, stable performance across two unrelated languages suggests robustness system multilingual...
Sparsity is one of the major problems in natural language processing. The problem becomes even more severe agglutinating languages that are highly prone to be inflected. We deal with sparsity Turkish by adopting morphological features for part-of-speech tagging. learn inflectional and derivational morpheme tags using conditional random fields (CRF) we employ (PoS) tagging hidden Markov models (HMMs) mitigate sparsity. Results show PoS helps alleviate emission probabilities. Our model...
This paper presents our submission to the first Shared Task on Multilingual Grammatical Error Detection (MultiGED-2023). Our method utilizes a transformer-based sequence to-sequence model, which was trained synthetic dataset consisting of 3.2 billion words. We adopt distantly supervised approach, with training process relying exclusively distribution language learners’ errors extracted from annotated corpus used construct data. In Swedish track, model ranks fourth out seven submissions in...