- Authorship Attribution and Profiling
- Natural Language Processing Techniques
- Topic Modeling
- Hate Speech and Cyberbullying Detection
- Bullying, Victimization, and Aggression
- Personality Traits and Psychology
- Spam and Phishing Detection
- Biomedical Text Mining and Ontologies
- E-commerce and Technology Innovations
- Lexicography and Language Studies
- Software Engineering Research
- Speech and dialogue systems
- Advanced Text Analysis Techniques
- Information and Cyber Security
- Stalking, Cyberstalking, and Harassment
- Advanced Malware Detection Techniques
- Speech Recognition and Synthesis
- Dental Implant Techniques and Outcomes
- Biometric Identification and Security
- Text Readability and Simplification
- Names, Identity, and Discrimination Research
- linguistics and terminology studies
- Artificial Intelligence in Games
- Semantic Web and Ontologies
- Sentiment Analysis and Opinion Mining
University of Antwerp
2012-2021
Jožef Stefan International Postgraduate School
2017
Jožef Stefan Institute
2017
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on adequate detection potentially harmful messages and information overload Web requires intelligent systems identify potential risks automatically. The focus this paper is automatic in text by modelling posts written bullies,...
We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian sites likely attract racist reactions. These comments labeled as or non-racist by multiple annotators. For our approach, three discourse dictionaries created: first, we created dictionary retrieving possibly and more neutral terms the training data, then augmenting these with general words remove some bias. A second was through automatic expansion using...
Abstract The detection of online cyberbullying has seen an increase in societal importance, popularity research, and available open data. Nevertheless, while computational power affordability resources continue to increase, the access restrictions on high-quality data limit applicability state-of-the-art techniques. Consequently, much recent research uses small, heterogeneous datasets, without a thorough evaluation applicability. In this paper, we further illustrate these issues, as (i)...
An important bottleneck in the development of accurate and robust personality recognition systems based on supervised machine learning, is limited availability training data, high cost involved collecting it. In this paper, we report a proof concept using ensemble learning as way to alleviate data acquisition problem. The approach allows use information from datasets different genres, classification even languages construction classifier, thereby improving its performance. exploratory...
We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by TwiSty corpus and (Verhoeven et al., 2016), we employed Janes (Erjavec 2016) its annotations perform Twitter comparing a token-based lemma-based approach. find that approach (92.6% accuracy), containing markings related author, outperforms about 5%. Especially in lemmatized version, also observe stylistic content-based differences writing between men (e.g. more profane language,...
Given the common ancestry of Dutch and Afrikaans, it is not surprising that they use similar periphrastic constructions to express progressive meaning: aan het (Dutch) die/’t (Afrikaans) lit. ‘at the’; bezig met /( om ) te ‘busy with/to’ besig to’ (Afrikaans); so-called cardinal posture verb ( zitten/sit ‘sit’, staan ‘stand’, liggen/lê ‘lie’ lopen/loop ‘walk’), CPV (‘to’ Dutch) en (‘and’ Afrikaans). However, these cognate have grammaticalized different extents. To assess exact nature...
This paper describes our submission for the WCPR14 shared task on computational personality recognition. We have investigated whether features proposed by Soler and Wanner (2014) gender prediction might also be useful in compared these with simple approaches using token unigrams, character trigrams liwc features. Although newly seem to work quite well certain traits, they do not outperform approaches.
We propose a novel way to create categorized discourse lexicons for multiple languages. combine information from the Penn Discourse Treebank with statistical machine translation techniques on Europarl corpus. Using gender profiling as an application, we evaluate our approach by comparing it using features knowledge-based lexicon and Rhetorical structure theory (RST) parser. Our experiments are performed corpora three languages (English, Dutch, German) in two genres (news blogs). include...
Compounding, the process of combining several simplex words into a complex whole, is productive in wide range languages. In particular, concatenative compounding, which components are “glued” together, leads to problems, for instance, computational tools that rely on predefined lexicon. Here we present AuCoPro project, focuses compounding closely related languages Afrikaans and Dutch. The project consists subprojects focusing compound splitting (identifying boundaries components) semantics...