NFDI4DS | UHH-SEMS - Publication Details

Murathan Kurfalı

ORCID: 0000-0002-7020-8275

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5002870078

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Multimodal Machine Learning Applications
Speech Recognition and Synthesis
COVID-19 and Mental Health
Sentiment Analysis and Opinion Mining
Behavioral Health and Interventions
Speech and dialogue systems
Mental Health Research Topics
Semantic Web and Ontologies
Social and Intergroup Psychology
Discourse Analysis in Language Studies
Advanced Text Analysis Techniques
Misinformation and Its Impacts
Child and Adolescent Psychosocial and Emotional Development
Advanced Chemical Sensor Technologies
COVID-19 diagnosis using AI
Text and Document Classification Technologies
Language Development and Disorders
Wine Industry and Tourism
Customer Service Quality and Loyalty
Language, Metaphor, and Cognition
Multi-Agent Systems and Negotiation
Energy Efficient Wireless Sensor Networks

RISE Research Institutes of Sweden
2024-2025

Stockholm University
2019-2024

Dartmouth College
2021

University of Stuttgart
2021

Uppsala University
2021

University of Duisburg-Essen
2021

East Stroudsburg University
2021

Middle East Technical University
2016-2020

Åbo Akademi University
2020

Piaggio (Italy)
2020

Adoption of e-government services in Turkey

OPENALEX - Publications

Murathan Kurfalı Ali Arifoğlu Gül Tokdemir Yudum Paçin

10.1016/j.chb.2016.09.041 article EN Computers in Human Behavior 2016-09-29

Towards better language representation in Natural Language Processing

OPENALEX - Publications

Arianna Masciolini Andrew Caines Orphée De Clercq Joni Kruijsbergen Murathan Kurfalı and 25 more

Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets that it covers several underrepresented languages, which we argue should be included resources used to train models Natural Language Processing tasks which, as itself, have implications Learner...

10.1075/ijlcr.24033.mas article EN International Journal of Learner Corpus Research 2025-04-01

TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style

OPENALEX - Publications

Deniz Zeyrek Amália Mendes Yulia Grishina Murathan Kurfalı Samuel Gibbon and 1 more

10.1007/s10579-019-09445-9 article EN Language Resources and Evaluation 2019-04-06

Characters or Morphemes: How to Represent Words?

OPENALEX - Publications

Ahmet Üstün Murathan Kurfalı Burcu Can

In this paper, we investigate the effects of using subword information in representation learning. We argue that syntactic units quality word representations positively. introduce a morpheme-based model and compare it against to word-based, character-based, character n-gram level models. Our takes list candidate segmentations learns based on different are weighted by an attention mechanism. performed experiments Turkish as morphologically rich language English with comparably poorer...

10.18653/v1/w18-3019 article EN cc-by 2018-01-01

A Rose by Another Name? Odor Misnaming is Associated with Linguistic Properties

OPENALEX - Publications

Thomas Hörberg Murathan Kurfalı Maria Larsson Erika J. Laukka Pawel Herman and 1 more

Abstract Naming common odors is a surprisingly difficult task: Odors are frequently misnamed. Little known about the linguistic properties of odor misnamings. We test whether misnamings old adults carry information olfactory perception and its connection to lexical‐semantic processing. analyze olfactory–semantic content source naming failures in large sample older Sweden ( n = 2479; age 58–100 years). investigate factors semantic proximity target name predict how misnamed, these relate...

10.1111/cogs.70003 article EN cc-by-nc Cognitive Science 2024-10-01

TDB 1.1: Extensions on Turkish Discourse Bank

OPENALEX - Publications

Deniz Zeyrek Murathan Kurfalı

This paper presents the recent developments on Turkish Discourse Bank (TDB). First, resource is summarized and an evaluation presented. Then, TDB 1.1, i.e. enrichments 10% of corpus are described (namely, senses for explicit discourse connectives, new annotations three relation types - implicit relations, entity relations alternative lexicalizations). The method annotation explained data evaluated.

10.18653/v1/w17-0809 article EN cc-by 2017-01-01

Let’s be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction

OPENALEX - Publications

Murathan Kurfalı Robert Östling

In implicit discourse relation classification, we want to predict the between adjacent sentences in absence of any overt connectives. This is challenging even for humans, leading shortage annotated data, a fact that makes task more difficult supervised machine learning approaches. current study, perform classification without relying on labeled relation. We sidestep lack data through explicitation relations reduce two sub-problems: language modeling and explicit much easier problem. Our...

10.18653/v1/2021.unimplicit-1.1 article EN cc-by 2021-01-01

The Psychological Science Accelerator’s COVID-19 rapid-response dataset

OPENALEX - Publications

Erin Michelle Buchanan Savannah C Lewis Bastien Paris Patrick S. Forscher Jeffrey M. Pavlacic and 95 more

Abstract In response to the COVID-19 pandemic, Psychological Science Accelerator coordinated three large-scale psychological studies examine effects of loss-gain framing, cognitive reappraisals, and autonomy framing manipulations on behavioral intentions affective measures. The data collected (April October 2020) included specific measures for each experimental study, a general questionnaire examining health prevention behaviors experience, geographical cultural context characterization,...

10.1038/s41597-022-01811-7 article EN cc-by Scientific Data 2023-02-11

Probing Multilingual Language Models for Discourse

OPENALEX - Publications

Murathan Kurfalı Robert Östling

Pre-trained multilingual language models have become an important building block in Natural Language Processing. In the present paper, we investigate a range of such to find out how well they transfer discourse-level knowledge across languages. This is done with systematic evaluation on broader set tasks than has been previously assembled. We that XLM-RoBERTa family consistently show best performance, by simultaneously being good monolingual and degrading relatively little zero-shot setting....

10.18653/v1/2021.repl4nlp-1.2 article EN cc-by 2021-01-01

Language Embeddings Sometimes Contain Typological Generalizations

OPENALEX - Publications

Robert Östling Murathan Kurfalı

Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1,295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.1162/coli_a_00491 article EN cc-by-nc-nd Computational Linguistics 2023-01-01

Zero-shot transfer for implicit discourse relation classification

OPENALEX - Publications

Murathan Kurfalı Robert Östling

Automatically classifying the relation between sentences in a discourse is challenging task, particular when there no overt expression of relation. It becomes even more by fact that annotated training data exists only for small number languages, such as English and Chinese. We present new system using zero-shot transfer learning implicit classification, where resource used target language unannotated parallel text. This evaluated on discourse-annotated TED-MDB corpus, it obtains good results...

10.18653/v1/w19-5927 article EN cc-by 2019-01-01

Noisy Parallel Corpus Filtering through Projected Word Embeddings

OPENALEX - Publications

Murathan Kurfalı Robert Östling

We present a very simple method for parallel text cleaning of low-resource languages, based on projection word embeddings trained large monolingual corpora in high-resource languages. In spite its simplicity, we approach the strong baseline system downstream machine translation evaluation.

10.18653/v1/w19-5438 article EN cc-by 2019-01-01

Language Embeddings Sometimes Contain Typological Generalizations

OPENALEX - Publications

Robert Östling Murathan Kurfalı

To what extent can neural network models learn generalizations about language structure, and how do we find out they have learned? We explore these questions by training for a range of natural processing tasks on massively multilingual dataset Bible translations in 1295 languages. The learned representations are then compared to existing typological databases as well novel set quantitative syntactic morphological features obtained through annotation projection. conclude that some...

10.48550/arxiv.2301.08115 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Linking discourse-level information and the induction of bilingual discourse connective lexicons

OPENALEX - Publications

Sibel Özer Murathan Kurfalı Deniz Zeyrek Amália Mendes Giedrė Valūnaitė Oleškevičienė

The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. existing resources are overwhelmingly monolingual, compelling researchers to infer discourse-level information target languages through error-prone automatic means. current paper aims provide a more direct insight into variations structures by linking annotated relations TED-Multilingual Discourse Bank, which consists independently six TED talks seven different...

10.3233/sw-223011 article EN other-oa Semantic Web 2022-06-21

Lightweight Connective Detection Using Gradient Boosting

OPENALEX - Publications

Mustafa Erolcan Er Murathan Kurfalı Deniz Zeyrek

In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, proposed approach sidesteps the computational demands of current approaches that rely deep neural networks. Considering its simplicity, our achieves competitive results while offering significant gains in terms time even CPU. Furthermore, stable performance across two unrelated languages suggests robustness system multilingual...

10.48550/arxiv.2404.13793 preprint EN arXiv (Cornell University) 2024-04-21

Chemosensory vocabulary in wine, perfume and food product reviews: Insights from language modeling

OPENALEX - Publications

Thomas Hörberg Murathan Kurfalı Jonas Olofsson

10.1016/j.foodqual.2024.105357 article EN cc-by Food Quality and Preference 2024-11-01

Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small Datasets

OPENALEX - Publications

Burcu Can Ahmet Üstün Murathan Kurfalı

Sparsity is one of the major problems in natural language processing. The problem becomes even more severe agglutinating languages that are highly prone to be inflected. We deal with sparsity Turkish by adopting morphological features for part-of-speech tagging. learn inflectional and derivational morpheme tags using conditional random fields (CRF) we employ (PoS) tagging hidden Markov models (HMMs) mitigate sparsity. Results show PoS helps alleviate emission probabilities. Our model...

10.48550/arxiv.1703.03200 preprint EN other-oa arXiv (Cornell University) 2017-01-01

A distantly supervised Grammatical Error Detection/Correction system for Swedish

OPENALEX - Publications

Murathan Kurfalı Robert Östling

This paper presents our submission to the first Shared Task on Multilingual Grammatical Error Detection (MultiGED-2023). Our method utilizes a transformer-based sequence to-sequence model, which was trained synthetic dataset consisting of 3.2 billion words. We adopt distantly supervised approach, with training process relying exclusively distribution language learners’ errors extracted from annotated corpus used construct data. In Swedish track, model ranks fourth out seven submissions in...

10.3384/ecp197004 article EN cc-by Linköping electronic conference proceedings 2023-05-16

Coming Soon ...