- Natural Language Processing Techniques
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Text Readability and Simplification
- Advanced Text Analysis Techniques
- Semantic Web and Ontologies
- Translation Studies and Practices
- Speech and dialogue systems
- Authorship Attribution and Profiling
- Text and Document Classification Technologies
- Speech Recognition and Synthesis
- Lexicography and Language Studies
- Recommender Systems and Techniques
- Language, Metaphor, and Cognition
- Emotion and Mood Recognition
- Digital Communication and Language
- Mental Health via Writing
- Media Studies and Communication
- Innovations in Medical Education
- Misinformation and Its Impacts
- Mathematics, Computing, and Information Processing
- Media Influence and Politics
- Humor Studies and Applications
- Service-Oriented Architecture and Web Services
- Web Data Mining and Analysis
Ghent University Hospital
2016-2025
Ghent University
2014-2023
Language Science (South Korea)
2022-2023
Université du Québec à Montréal
2023
Tokyo University of Foreign Studies
2023
Institut Alfred Fournier
2023
Artevelde University College Ghent
2022
Bar-Ilan University
2021
University of Helsinki
2021
Tel Aviv University
2021
Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud María Jiménez-Zafra, Gülşen Eryiğit. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 2016.
Past shared tasks on emotions use data with both overt expressions of (I am so happy to see you!) as well subtle where the have be inferred, for instance from event descriptions. Further, most datasets do not focus cause or stimulus emotion. Here, first time, we propose a task systems predict in large automatically labeled dataset tweets without access words denoting emotions. Based this intention, call Implicit Emotion Shared Task (IEST) because infer emotion mostly context. Every tweet has...
In this article we present a corpus-based statistical approach to measuring translation quality, more particularly acceptability, by comparing the features of translated and original texts. We discuss initial findings that aim support objectify formative quality assessment. To end, extract multitude linguistic textual from both student professional corpora consist many different translations several translators in two genres (fiction, news) directions (English French Dutch). The numerical...
This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The contains five different text types is balanced with respect to type translation direction. All texts included in have been cleared from copyright. We discuss importance corpora various research domains contrast Corpus existing corpora. distinguishes itself other by having composition its availability wide community, thanks copyright...
Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets that it covers several underrepresented languages, which we argue should be included resources used to train models Natural Language Processing tasks which, as itself, have implications Learner...
Abstract While human annotation is crucial for many natural language processing tasks, it often very expensive and time-consuming. Inspired by previous work on crowdsourcing, we investigate the viability of using non-expert labels instead gold standard annotations from experts a machine learning approach to automatic readability prediction. In order do so, evaluate two different methodologies assess wide variety text material: A more traditional setup in which expert readers make judgments...
Readability research has a long and rich tradition, but there been too little focus on general readability prediction without targeting specific audience or text genre. Moreover, although NLP-inspired focused adding more complex features, is still no consensus which features contribute most to the prediction. In this article, we investigate in close detail feasibility of constructing system for English Dutch generic using supervised machine learning. Based assessments by both experts...
This paper reports on the NLP4CALL shared task Multilingual Grammatical Error Detection (MultiGED-2023), which included five languages: Czech, English, German, Italian and Swedish. It is first organized by Computational SLA1 working group, whose aim to promote less represented languages in fields of Correction, other related fields. The MultiGED datasets have been produced based second language (L2) learner corpora for each particular language. In this we introduce as a whole, elaborate...
As social media constitutes a valuable source for data analysis wide range of applications, the need handling such arises. However, nonstandard language used on poses problems natural processing (NLP) tools, as these are typically trained standard material. We propose text normalization approach to tackle this problem. More specifically, we investigate usefulness multimodular account diversity issues encountered in user-generated content (UGC). consider three different types UGC written...
Whereas post-edited texts have been shown to be either of comparable quality human translations or better, one study shows that people still seem prefer human-translated texts. The idea being inherently different despite high is not new. Translated texts, for example, are also from original a phenomenon referred as ‘Translationese’. Research into Translationese has that, whereas humans cannot distinguish between translated and text, computers trained detect successfully. It remains seen...
This paper presents an integrated ABSA pipeline for Dutch that has been developed and tested on qualitative user feedback coming from three domains: retail, banking human resources. The two latter domains provide service-oriented data, which not investigated before in ABSA. By performing in-domain cross-domain experiments the validity of our approach was investigated. We show promising results subtasks, aspect term extraction, category classification polarity classification.
Research in the field of translation studies suggests that product features can indicate difficulty. In current pilot study, we investigate three these features, namely number errors made a translation, word entropy and degree syntactic equivalence. We correlate with process be put together into categories: duration, revision gaze information. These serve as proxy for cognitive effort required to solve difficulties translation. The data used was gathered from professional translators well...
This paper presents an emotion classification system for English tweets, submitted the SemEval shared task on Affect in Tweets, subtask 5: Detecting Emotions. The combines lexicon, n-gram, style, syntactic and semantic features. For this multi-class multi-label problem, we created a classifier chain. is ensemble of eleven binary classifiers, one each possible category, where model gets predictions preceding models as additional predicted labels are combined to get representation predictions....
In this paper, we present a benchmark result for end-to-end cross-document event coreference resolution in Dutch. First, the state of art task other languages is introduced, as well currently existing resources and commonly used evaluation metrics. We then build on recently published work to fully explore first time Dutch language domain. For purpose, two well-performing transformer-based algorithms respective detection textual events are combined pipeline architecture compared baseline...
In this article, the authors analyze citizens’ reactions to Brexit on social media after referendum results by performing a content analysis of 5877 posts collected from platform Flickr, written in English, German, French, Spanish or Italian. Their research aims answer three following questions: What multimodal practices are adopted citizens when they react societal events like Brexit? To what extent do these illustrate types citizenship that specific networks? Can we observe different...
Times of crisis are usually associated with highly emotional experiences, which often result in emotionally charged communication. This is especially the case on social media. Identifying climate media imperative context communication, e.g., view shaping response strategies. However, sheer volume data makes manual oversight impossible. In this paper, we therefore investigate how automatic methods for emotion detection can aid research communication and Concretely, two Dutch models (a...
One of the most persistent characteristics written user-generated content (UGC) is use non-standard words. This characteristic contributes to an increased difficulty automatically process and analyze UGC. Text normalization task transforming lexical variants their canonical forms often used as a pre-processing step for conventional NLP tasks in order overcome performance drop that systems experience when applied In this work, we follow Neural Machine Translation approach text normalization....
The LT3 system perceives ABSA as a task consisting of three main subtasks, which have to be tackled incrementally, namely aspect term extraction, classification and polarity classification.For the first two steps, we see that employing hybrid terminology extraction leads promising results, especially when it comes recall.For classification, show is possible gain satisfying accuracies, even on out-ofdomain data, with basic model only lexical information.