Murathan Kurfalı

ORCID: 0000-0002-7020-8275
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • COVID-19 and Mental Health
  • Sentiment Analysis and Opinion Mining
  • Behavioral Health and Interventions
  • Speech and dialogue systems
  • Mental Health Research Topics
  • Semantic Web and Ontologies
  • Social and Intergroup Psychology
  • Discourse Analysis in Language Studies
  • Advanced Text Analysis Techniques
  • Misinformation and Its Impacts
  • Child and Adolescent Psychosocial and Emotional Development
  • Advanced Chemical Sensor Technologies
  • COVID-19 diagnosis using AI
  • Text and Document Classification Technologies
  • Language Development and Disorders
  • Wine Industry and Tourism
  • Customer Service Quality and Loyalty
  • Language, Metaphor, and Cognition
  • Multi-Agent Systems and Negotiation
  • Energy Efficient Wireless Sensor Networks

RISE Research Institutes of Sweden
2024-2025

Stockholm University
2019-2024

Dartmouth College
2021

University of Stuttgart
2021

Uppsala University
2021

University of Duisburg-Essen
2021

East Stroudsburg University
2021

Middle East Technical University
2016-2020

Åbo Akademi University
2020

Piaggio (Italy)
2020

Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets that it covers several underrepresented languages, which we argue should be included resources used to train models Natural Language Processing tasks which, as itself, have implications Learner...

10.1075/ijlcr.24033.mas article EN International Journal of Learner Corpus Research 2025-04-01

In this paper, we investigate the effects of using subword information in representation learning. We argue that syntactic units quality word representations positively. introduce a morpheme-based model and compare it against to word-based, character-based, character n-gram level models. Our takes list candidate segmentations learns based on different are weighted by an attention mechanism. performed experiments Turkish as morphologically rich language English with comparably poorer...

10.18653/v1/w18-3019 article EN cc-by 2018-01-01

Abstract Naming common odors is a surprisingly difficult task: Odors are frequently misnamed. Little known about the linguistic properties of odor misnamings. We test whether misnamings old adults carry information olfactory perception and its connection to lexical‐semantic processing. analyze olfactory–semantic content source naming failures in large sample older Sweden ( n = 2479; age 58–100 years). investigate factors semantic proximity target name predict how misnamed, these relate...

10.1111/cogs.70003 article EN cc-by-nc Cognitive Science 2024-10-01

This paper presents the recent developments on Turkish Discourse Bank (TDB). First, resource is summarized and an evaluation presented. Then, TDB 1.1, i.e. enrichments 10% of corpus are described (namely, senses for explicit discourse connectives, new annotations three relation types - implicit relations, entity relations alternative lexicalizations). The method annotation explained data evaluated.

10.18653/v1/w17-0809 article EN cc-by 2017-01-01

In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...

10.18653/v1/2021.unimplicit-1.1 article EN cc-by 2021-01-01
Erin Michelle Buchanan Savannah C Lewis Bastien Paris Patrick S. Forscher Jeffrey M. Pavlacic and 95 more Julie Beshears Shira Meir Drexler Amélie Gourdon-Kanhukamwe Peter Robert Mallik Miguel Alejandro A. Silan Jeremy K. Miller Hans IJzerman Hannah Moshontz Jennifer L Beaudry Jordan W. Suchow Christopher R. Chartier Nicholas A. Coles MohammadHasan Sharifian Anna Louise Todsen Carmel Levitan Flávio Azevedo Nicole Legate Blake Heller Alexander Rothman Charles Dorison Brian Gill Ke Wang Vaughan W. Rees Nancy Gibbs Amit Goldenberg Thuy-vy Thi Nguyen James J. Gross Gwenaêl Kaminski Claudia C. von Bastian Mariola Paruzel‐Czachura Farnaz Mosannenzadeh Soufian Azouaghe Alexandre Bran Susana Ruiz Fernández Anabela Caetano Santos Niv Reggev Janis Zickfeld Handan Akkaş Myrto Pantazi Ivan Ropovik Max Korbmacher Patrí­cia Arriaga Biljana Gjoneska Lara Warmelink Sara G. Alves Gabriel Lins de Holanda Coelho Stefan Stieger Vidar Schei Paul H. P. Hanel Barnabás Szászi Maksim Fedotov Jan Antfolk Gabriela Mariana Marcu Jana Schrötter Jonas R. Kunst Sandra J. Geiger Adeyemi Adetula Halil Emre Kocalar Julita Kielińska Pavol Kačmár Ahmed Bokkour Oscar J. Galindo-Caballero Ikhlas Djamai Sara Johanna Pöntinen Bamikole Emmanuel Agesin Teodor Jernsäther Anum Urooj Nikolay R. Rachev María Koptjevskaja-Tamm Murathan Kurfalı Ilse L. Pit Ranran Li Sami Çoksan Dmitrii Dubrov Tamar Paltrow Gabriel Baník Tatiana Korobova Anna Studzińska Xiaoming Jiang John Jamir Benzon R. Aruta Jáchym Vintr Faith Chiu Lada Kaliská Jana Berkessel Murat Tümer Sara Morales-Izquierdo Hu Chuan-Peng Kévin Vezirian Anna Dalla Rosa Olga Białobrzeska Martin R. Vasilev Julia Beitner Ondřej Kácha Barbara Žuro Minja Westerlund

Abstract In response to the COVID-19 pandemic, Psychological Science Accelerator coordinated three large-scale psychological studies examine effects of loss-gain framing, cognitive reappraisals, and autonomy framing manipulations on behavioral intentions affective measures. The data collected (April October 2020) included specific measures for each experimental study, a general questionnaire examining health prevention behaviors experience, geographical cultural context characterization,...

10.1038/s41597-022-01811-7 article EN cc-by Scientific Data 2023-02-11

Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....

10.18653/v1/2021.repl4nlp-1.2 article EN cc-by 2021-01-01

Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.1162/coli_a_00491 article EN cc-by-nc-nd Computational Linguistics 2023-01-01

Automatically classifying the relation between sentences in a discourse is challenging task, particular when there no overt expression of relation. It becomes even more by fact that annotated training data exists only for small number languages, such as English and Chinese. We present new system using zero-shot transfer learning implicit classification, where resource used target language unannotated parallel text. This evaluated on discourse-annotated TED-MDB corpus, it obtains good results...

10.18653/v1/w19-5927 article EN cc-by 2019-01-01

We present a very simple method for parallel text cleaning of low-resource languages, based on projection word embeddings trained large monolingual corpora in high-resource languages. In spite its simplicity, we approach the strong baseline system downstream machine translation evaluation.

10.18653/v1/w19-5438 article EN cc-by 2019-01-01

To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.48550/arxiv.2301.08115 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. existing resources are overwhelmingly monolingual, compelling researchers to infer discourse-level information target languages through error-prone automatic means. current paper aims provide a more direct insight into variations structures by linking annotated relations TED-Multilingual Discourse Bank, which consists independently six TED talks seven different...

10.3233/sw-223011 article EN other-oa Semantic Web 2022-06-21

In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, proposed approach sidesteps the computational demands of current approaches that rely deep neural networks. Considering its simplicity, our achieves competitive results while offering significant gains in terms time even CPU. Furthermore, stable performance across two unrelated languages suggests robustness system multilingual...

10.48550/arxiv.2404.13793 preprint EN arXiv (Cornell University) 2024-04-21

Sparsity is one of the major problems in natural language processing. The problem becomes even more severe agglutinating languages that are highly prone to be inflected. We deal with sparsity Turkish by adopting morphological features for part-of-speech tagging. learn inflectional and derivational morpheme tags using conditional random fields (CRF) we employ (PoS) tagging hidden Markov models (HMMs) mitigate sparsity. Results show PoS helps alleviate emission probabilities. Our model...

10.48550/arxiv.1703.03200 preprint EN other-oa arXiv (Cornell University) 2017-01-01

This paper presents our submission to the first Shared Task on Multilingual Grammatical Error Detection (MultiGED-2023). Our method utilizes a transformer-based sequence to-sequence model, which was trained synthetic dataset consisting of 3.2 billion words. We adopt distantly supervised approach, with training process relying exclusively distribution language learners’ errors extracted from annotated corpus used construct data. In Swedish track, model ranks fourth out seven submissions in...

10.3384/ecp197004 article EN cc-by Linköping electronic conference proceedings 2023-05-16
Coming Soon ...