Martin Potthast

ORCID: 0000-0003-2451-0665
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Authorship Attribution and Profiling
  • Web Data Mining and Analysis
  • Wikis in Education and Collaboration
  • Semantic Web and Ontologies
  • Information Retrieval and Search Behavior
  • Advanced Text Analysis Techniques
  • Hate Speech and Cyberbullying Detection
  • Software Engineering Research
  • Misinformation and Its Impacts
  • Spam and Phishing Detection
  • Sentiment Analysis and Opinion Mining
  • Data Quality and Management
  • Multimodal Machine Learning Applications
  • Names, Identity, and Discrimination Research
  • Expert finding and Q&A systems
  • Academic integrity and plagiarism
  • Algorithms and Data Compression
  • Machine Learning and Algorithms
  • Advanced Image and Video Retrieval Techniques
  • Text and Document Classification Technologies
  • Scientific Computing and Data Management
  • Interpreting and Communication in Healthcare
  • Text Readability and Simplification

Leipzig University
2015-2024

Bauhaus-Universität Weimar
2012-2024

Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2024

University of Kassel
2023-2024

The University of Queensland
2023-2024

Hess (United States)
2023-2024

Universidade Estadual de Campinas (UNICAMP)
2024

University of Waterloo
2024

CEA LIST
2024

University of Amsterdam
2023

We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus 1,627 articles from 9 political publishers, three each the mainstream, left, right, have been fact-checked by professional journalists at BuzzFeed: 97% 299 identified are also hyperpartisan. show how can distinguish mainstream (F1 = 0.78), satire both 0.81). But stylometry is no silver bullet as style-based detection does not work 0.46). further reveal that left-wing right-wing share...

10.18653/v1/p18-1022 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...

10.18653/v1/k17-3001 article EN cc-by 2017-01-01

10.1007/s10579-009-9114-z article EN Language Resources and Evaluation 2010-01-29

Hyperpartisan news is that takes an extreme left-wing or right-wing standpoint. If one able to reliably compute this meta information, articles may be automatically tagged, way encouraging discouraging readers consume the text. It open question how successfully hyperpartisan detection can automated, and goal of SemEval task was shed light on state art. We developed new resources for purpose, including a manually labeled dataset with 1,273 articles, second 754,000 via distant supervision. The...

10.18653/v1/s19-2145 article EN cc-by 2019-01-01

When asked, large language models~(LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for~LLMs to support along concerns and issues arise. We devise a human--machine collaboration spectrum allows categorize different judgment strategies, based on how much humans rely machines. For the extreme point 'fully judgments', further...

10.1145/3578337.3605136 article EN 2023-08-09

10.5555/1793274.1793338 article ET European Conference on Information Retrieval 2008-03-30

Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, Benno Stein. Proceedings of the 4th Workshop on Argument Mining. 2017.

10.18653/v1/w17-5106 article EN cc-by 2017-01-01

10.5555/1793274.1793363 article EN European Conference on Information Retrieval 2008-03-30

This report documents the program and outcomes of Dagstuhl Seminar 23031 "Frontiers Information Access Experimentation for Research Education", which brought together 38 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) specifically focused on developing more responsible experimental practices leading to valid results, both research as well scientific education. featured a...

10.1145/3636341.3636351 article EN ACM SIGIR Forum 2023-06-01

For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using entire Wikipedia corpus compile n-gram indexes compare them to a new kind fingerprint index plagiarism analysis use case. Our provides an speed-up by factor 1.5 is order magnitude smaller, while being equivalent terms precision recall.

10.1145/1277741.1277928 article EN 2007-07-23

To paraphrase means to rewrite content while preserving the original meaning. Paraphrasing is important in fields such as text reuse journalism, anonymizing work, and improving quality of customer-written reviews. This article contributes acquisition focuses on two aspects that are not addressed by current research: (1) via crowdsourcing, (2) passage-level samples. The challenge first aspect automatic assurance; without a crowdsourcing paradigm effective, creation test corpora unacceptably...

10.1145/2483669.2483676 article EN ACM Transactions on Intelligent Systems and Technology 2013-06-01

We reproduce four Twitter sentiment classification approaches that participated in previous SemEval editions with diverse feature sets.The reproduced are combined an ensemble, averaging the individual classifiers' confidence scores for three classes (positive, neutral, negative) and deciding polarity based on these averages.The experimental evaluation Sem-Eval data shows our re-implementations to slightly outperform their respective originals.Moreover, not too surprisingly, ensemble of...

10.18653/v1/s15-2097 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2015-01-01
Coming Soon ...