- Topic Modeling
- Natural Language Processing Techniques
- Authorship Attribution and Profiling
- Web Data Mining and Analysis
- Wikis in Education and Collaboration
- Semantic Web and Ontologies
- Information Retrieval and Search Behavior
- Advanced Text Analysis Techniques
- Hate Speech and Cyberbullying Detection
- Software Engineering Research
- Misinformation and Its Impacts
- Spam and Phishing Detection
- Sentiment Analysis and Opinion Mining
- Data Quality and Management
- Multimodal Machine Learning Applications
- Names, Identity, and Discrimination Research
- Expert finding and Q&A systems
- Academic integrity and plagiarism
- Algorithms and Data Compression
- Machine Learning and Algorithms
- Advanced Image and Video Retrieval Techniques
- Text and Document Classification Technologies
- Scientific Computing and Data Management
- Interpreting and Communication in Healthcare
- Text Readability and Simplification
Leipzig University
2015-2024
Bauhaus-Universität Weimar
2012-2024
Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2024
University of Kassel
2023-2024
The University of Queensland
2023-2024
Hess (United States)
2023-2024
Universidade Estadual de Campinas (UNICAMP)
2024
University of Waterloo
2024
CEA LIST
2024
University of Amsterdam
2023
We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus 1,627 articles from 9 political publishers, three each the mainstream, left, right, have been fact-checked by professional journalists at BuzzFeed: 97% 299 identified are also hyperpartisan. show how can distinguish mainstream (F1 = 0.78), satire both 0.81). But stylometry is no silver bullet as style-based detection does not work 0.46). further reveal that left-wing right-wing share...
Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...
Hyperpartisan news is that takes an extreme left-wing or right-wing standpoint. If one able to reliably compute this meta information, articles may be automatically tagged, way encouraging discouraging readers consume the text. It open question how successfully hyperpartisan detection can automated, and goal of SemEval task was shed light on state art. We developed new resources for purpose, including a manually labeled dataset with 1,273 articles, second 754,000 via distant supervision. The...
When asked, large language models~(LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for~LLMs to support along concerns and issues arise. We devise a human--machine collaboration spectrum allows categorize different judgment strategies, based on how much humans rely machines. For the extreme point 'fully judgments', further...
A spectrum of human-artificial intelligence collaboration in assessing relevance.
Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, Benno Stein. Proceedings of the 4th Workshop on Argument Mining. 2017.
This report documents the program and outcomes of Dagstuhl Seminar 23031 "Frontiers Information Access Experimentation for Research Education", which brought together 38 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) specifically focused on developing more responsible experimental practices leading to valid results, both research as well scientific education. featured a...
For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using entire Wikipedia corpus compile n-gram indexes compare them to a new kind fingerprint index plagiarism analysis use case. Our provides an speed-up by factor 1.5 is order magnitude smaller, while being equivalent terms precision recall.
To paraphrase means to rewrite content while preserving the original meaning. Paraphrasing is important in fields such as text reuse journalism, anonymizing work, and improving quality of customer-written reviews. This article contributes acquisition focuses on two aspects that are not addressed by current research: (1) via crowdsourcing, (2) passage-level samples. The challenge first aspect automatic assurance; without a crowdsourcing paradigm effective, creation test corpora unacceptably...
We reproduce four Twitter sentiment classification approaches that participated in previous SemEval editions with diverse feature sets.The reproduced are combined an ensemble, averaging the individual classifiers' confidence scores for three classes (positive, neutral, negative) and deciding polarity based on these averages.The experimental evaluation Sem-Eval data shows our re-implementations to slightly outperform their respective originals.Moreover, not too surprisingly, ensemble of...