- Natural Language Processing Techniques
- Topic Modeling
- Semantic Web and Ontologies
- Hate Speech and Cyberbullying Detection
- Sentiment Analysis and Opinion Mining
- Advanced Text Analysis Techniques
- Misinformation and Its Impacts
- Speech and dialogue systems
- Text Readability and Simplification
- Biomedical Text Mining and Ontologies
- Language, Metaphor, and Cognition
- Linguistic Studies and Language Acquisition
- Computational and Text Analysis Methods
- Spam and Phishing Detection
- Language and cultural evolution
- Digital Communication and Language
- Mobile Crowdsensing and Crowdsourcing
- Data-Driven Disease Surveillance
- Terrorism, Counterterrorism, and Political Violence
- Software Engineering Research
- Wikis in Education and Collaboration
- Web Data Mining and Analysis
- Social Media and Politics
- Data Mining Algorithms and Applications
- Phonetics and Phonology Research
University of Groningen
2017-2025
University of Ljubljana
2024
University of Sarajevo
2024
University of Novi Sad
2024
University Medical Center Groningen
2024
University of Belgrade
2024
Sofia University "St. Kliment Ohridski"
2024
University of Turin
2020-2023
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural processing (NLP) tasks. Using the same architecture and parameters, we developed evaluated a monolingual Dutch called BERTje. Compared multilingual model, which includes but is only based Wikipedia text, BERTje large diverse dataset of 2.4 billion tokens. consistently outperforms equally-sized downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic...
We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The was trained on RAL-E, large-scale dataset of Reddit comments English from communities banned being offensive, abusive, or hateful that we have curated and made available to the public. present results detailed comparison between general pre-trained retrained version three datasets hate speech tasks. In all datasets, HateBERT outperforms corresponding model. also discuss battery experiments comparing...
Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani, Tommaso Caselli, Gijs Danoe, Friso Stolk, Britt Bruntink, Preslav Nakov. Findings of the Association for Computational Linguistics: EMNLP 2021.
This paper reports on the Event StoryLine Corpus (ESC) v1.0, a new benchmark dataset for temporal and causal relation detection. By developing this dataset, we also introduce task, Extraction from news data, which aims at extracting classifying events relevant stories, across documents spread in time clustered around single seminal event or topic. In addition to describing report three baselines systems whose results show complexity of task suggest directions development more robust systems.
Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines event causality and conventional that focus more on linguistics. Many restrict themselves to include only explicit or clause-based arguments. Therefore, we propose an schema for addresses these concerns. We annotated 3,559 sentences from protest news with labels whether it contains not. Our corpus known as Causal News Corpus (CNC)....
Conceptual concreteness and categorical specificity are two continuous variables that allow distinguishing, for example, justice (low concreteness) from banana (high furniture specificity) rocking chair specificity). The relation between these is unclear, with some scholars suggesting they might be highly correlated. In this study, we operationalize both conduct a series of analyses on sample > 13,000 nouns, to investigate the relationship them. Concreteness operationalized by means ratings,...
ABSTRACT New expressions—or neologisms—continue to emerge in the discourse around climate issues (e.g., “flight shame”). Does emergence of neologisms merely reflect shifts sustainable attitudes, or can new expressions also speed up/frustrate social change? Building on literature grounded linguistics and environmental psychology, we conclude that may have an important, yet underrated not sufficiently investigated potential influence change. In this Focus Article, first discuss way which...
Stories are the most natural ways for people to deal with information about changing world.They provide an efficient schematic structure order and relate events according some explanation.We describe (1) a formal model representing storylines handle streams of news (2) first implementation system that automatically extracts ingredients storyline from articles model.Our mimics basic notions narratology by adding bridging relations timelines in relation climax point.We method defining score...
Fiona Anting Tan, Hansi Hettiarachchi, Ali Hürriyetoğlu, Tommaso Caselli, Onur Uca, Farhana Ferdousi Liza, Nelleke Oostdijk. Proceedings of the 5th Workshop on Challenges and Applications Automated Extraction Socio-political Events from Text (CASE). 2022.
Sentiment analysis tends to focus on the polarity of words, combining their values detect which portion a text is opinionated.CLIPEval wants promote more holistic approach, looking at psychological researches that frame connotations words as emotional activated by them.The implicit events just one aspect connotative meaning and we address it with task based dataset sentences annotated instantiations pleasant unpleasant previously collected in research ones human judgments converge.
Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko. Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). 2021.