- Natural Language Processing Techniques
- Topic Modeling
- Authorship Attribution and Profiling
- Speech Recognition and Synthesis
- Semantic Web and Ontologies
- Text Readability and Simplification
- Speech and dialogue systems
- Hate Speech and Cyberbullying Detection
- Linguistic Variation and Morphology
- Language Development and Disorders
- Language and cultural evolution
- Swearing, Euphemism, Multilingualism
- Music and Audio Processing
- Phonetics and Phonology Research
- Syntax, Semantics, Linguistic Variation
- Mental Health via Writing
- Freedom of Expression and Defamation
- Social Media and Politics
- Digital Communication and Language
- Language, Metaphor, and Cognition
- Sentiment Analysis and Opinion Mining
- Linguistics, Language Diversity, and Identity
- Hearing Loss and Rehabilitation
- Mathematics, Computing, and Information Processing
- Legal Language and Interpretation
University of Tübingen
2015-2024
University of Pennsylvania
2018
University of Colorado Boulder
2018
Commonwealth Scientific and Industrial Research Organisation
2018
Nuance Communications (United Kingdom)
2018
California University of Pennsylvania
2018
German Research Centre for Artificial Intelligence
2018
Toyota Technological Institute at Chicago
2018
Uppsala University
2017
University of Groningen
2013-2015
Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020.
Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...
This paper presents a set of classification experiments for identifying depression in posts gathered from social media platforms. In addition to the data previously by other researchers, we collect additional platform Reddit. Our show promising results texts. More importantly, however, that choice corpora is crucial and can lead misleading conclusions case poor data.
This paper presents the ParlaMint corpora containing transcriptions of sessions 17 European national parliaments with half a billion words. The are uniformly encoded, contain rich meta-data about 11 thousand speakers, and linguistically annotated following Universal Dependencies formalism named entities. Samples conversion scripts available from project's GitHub repository, complete openly via CLARIN.SI repository for download, as well through NoSketch Engine KonText concordancers Parlameter...
Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri. Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023). 2023.
This paper describes our participation in the SemEval-2018 task Multilingual Emoji Prediction. We participated both English and Spanish subtasks, experimenting with support vector machines (SVMs) recurrent neural networks. Our SVM classifier obtained top rank subtasks macro-averaged F1-measures of 35.99% for 22.36% data sets. Similar to a few earlier attempts, results networks were not on par linear SVMs.
This paper describes the work done by team tearsofjoy participating in VarDial 2019 Evaluation Campaign. We developed two systems based on Support Vector Machines: SVM with a flat combination of features and ensembles. participated all language/dialect identification tasks, as well Moldavian vs. Romanian cross-dialect topic (MRC) task. Our achieved first place German Dialect (GDI) MRC subtasks 2 3, second simplified variant Discriminating between Mainland Taiwan variation Mandarin Chinese...
Gabmap is a freely available, open-source web application that analyzes the data of language variation, e.g. varying words for same concepts, pronunciations words, or frequencies syntactic constructions in transcribed conversations. an integrated part CLARIN (see http://portal.clarin.nl). This article summarizes Gabmap's basic functionality, adding material on some new features and reporting range uses to which has been put. modestly successful, its popularity underscores fact study...
This paper describes our systems and results on VarDial 2017 shared tasks. Besides three language/dialect discrimination tasks, we also participated in the cross-lingual dependency parsing (CLP) task using a simple methodology which briefly describe this paper. For all used linear SVMs with character word features. The system achieves competitive among other task. We report additional experiments neural network models. performance of models was close but always below corresponding SVM...
Aleksandrs Berdicevskis, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, Christian Bentz. Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). 2018.
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to hierarchical taxonomy OLID schema (Zampieri et al., 2019a) from OffensEval 2019. featured five languages: English, Arabic, Danish, Greek, Turkish for Subtask A. In addition, English also Subtasks B C. 2020 was one most popular tasks at attracting a large number participants across all languages....