NFDI4DS | UHH-SEMS - Publication Details

Martin Potthast

ORCID: 0000-0003-2451-0665

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5083712311

Research Areas

Topic Modeling
Natural Language Processing Techniques
Authorship Attribution and Profiling
Web Data Mining and Analysis
Wikis in Education and Collaboration
Semantic Web and Ontologies
Information Retrieval and Search Behavior
Advanced Text Analysis Techniques
Hate Speech and Cyberbullying Detection
Software Engineering Research
Misinformation and Its Impacts
Spam and Phishing Detection
Sentiment Analysis and Opinion Mining
Data Quality and Management
Multimodal Machine Learning Applications
Names, Identity, and Discrimination Research
Expert finding and Q&A systems
Academic integrity and plagiarism
Algorithms and Data Compression
Machine Learning and Algorithms
Advanced Image and Video Retrieval Techniques
Text and Document Classification Technologies
Scientific Computing and Data Management
Interpreting and Communication in Healthcare
Text Readability and Simplification

Leipzig University
2015-2024

Bauhaus-Universität Weimar
2012-2024

Commissariat à l'Énergie Atomique et aux Énergies Alternatives
2024

University of Kassel
2023-2024

The University of Queensland
2023-2024

Hess (United States)
2023-2024

Universidade Estadual de Campinas (UNICAMP)
2024

University of Waterloo
2024

CEA LIST
2024

University of Amsterdam
2023

A Stylometric Inquiry into Hyperpartisan and Fake News

OPENALEX - Publications

Martin Potthast Johannes Kiesel Kevin Reinartz Janek Bevendorff Benno Stein

We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news. A corpus 1,627 articles from 9 political publishers, three each the mainstream, left, right, have been fact-checked by professional journalists at BuzzFeed: 97% 299 identified are also hyperpartisan. show how can distinguish mainstream (F1 = 0.78), satire both 0.81). But stylometry is no silver bullet as style-based detection does not work 0.46). further reveal that left-wing right-wing share...

10.18653/v1/p18-1022 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

OPENALEX - Publications

Daniel Zeman Martin Popel Milan Straka Jan Hajič Joakim Nivre and 57 more

Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...

10.18653/v1/k17-3001 article EN cc-by 2017-01-01

Cross-language plagiarism detection

OPENALEX - Publications

Martin Potthast Alberto Barrón‐Cedeño Benno Stein Paolo Rosso

10.1007/s10579-009-9114-z article EN Language Resources and Evaluation 2010-01-29

SemEval-2019 Task 4: Hyperpartisan News Detection

OPENALEX - Publications

Johannes Kiesel Maria Mestre Rishabh Shukla Emmanuel Vincent Payam Adineh and 3 more

Hyperpartisan news is that takes an extreme left-wing or right-wing standpoint. If one able to reliably compute this meta information, articles may be automatically tagged, way encouraging discouraging readers consume the text. It open question how successfully hyperpartisan detection can automated, and goal of SemEval task was shed light on state art. We developed new resources for purpose, including a manually labeled dataset with 1,273 articles, second 754,000 via distant supervision. The...

10.18653/v1/s19-2145 article EN cc-by 2019-01-01

Perspectives on Large Language Models for Relevance Judgment

OPENALEX - Publications

Guglielmo Faggioli Laura Dietz Charles L. A. Clarke Gianluca Demartini Matthias Hagen and 6 more

When asked, large language models~(LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for~LLMs to support along concerns and issues arise. We devise a human--machine collaboration spectrum allows categorize different judgment strategies, based on how much humans rely machines. For the extreme point 'fully judgments', further...

10.1145/3578337.3605136 article EN 2023-08-09

Who Determines What Is Relevant? Humans or AI? Why Not Both?

OPENALEX - Publications

Guglielmo Faggioli Laura Dietz Charles L. A. Clarke Gianluca Demartini Matthias Hagen and 6 more

A spectrum of human-artificial intelligence collaboration in assessing relevance.

10.1145/3624730 article EN Communications of the ACM 2024-03-15

A Wikipedia-based multilingual retrieval model

OPENALEX - Publications

Martin Potthast Benno Stein Maik Anderka

10.5555/1793274.1793338 article ET European Conference on Information Retrieval 2008-03-30

Building an Argument Search Engine for the Web

OPENALEX - Publications

Henning Wachsmuth Martin Potthast Khalid Al‐Khatib Yamen Ajjour Jana Puschmann and 5 more

Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, Benno Stein. Proceedings of the 4th Workshop on Argument Mining. 2017.

10.18653/v1/w17-5106 article EN cc-by 2017-01-01

Automatic vandalism detection in Wikipedia

OPENALEX - Publications

Martin Potthast Benno Stein Robert Gerling

10.5555/1793274.1793363 article EN European Conference on Information Retrieval 2008-03-30

Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and Education

OPENALEX - Publications

Christine Bauer Ben Carterette Nicola Ferro Norbert Fuhr Joeran Beel and 33 more

This report documents the program and outcomes of Dagstuhl Seminar 23031 "Frontiers Information Access Experimentation for Research Education", which brought together 38 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) specifically focused on developing more responsible experimental practices leading to valid results, both research as well scientific education. featured a...

10.1145/3636341.3636351 article EN ACM SIGIR Forum 2023-06-01

Strategies for retrieving plagiarized documents

OPENALEX - Publications

Benno Stein Sven Meyer zu Eissen Martin Potthast

For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using entire Wikipedia corpus compile n-gram indexes compare them to a new kind fingerprint index plagiarism analysis use case. Our provides an speed-up by factor 1.5 is order magnitude smaller, while being equivalent terms precision recall.

10.1145/1277741.1277928 article EN 2007-07-23

Paraphrase acquisition via crowdsourcing and machine learning

OPENALEX - Publications

Steven Burrows Martin Potthast Benno Stein

To paraphrase means to rewrite content while preserving the original meaning. Paraphrasing is important in fields such as text reuse journalism, anonymizing work, and improving quality of customer-written reviews. This article contributes acquisition focuses on two aspects that are not addressed by current research: (1) via crowdsourcing, (2) passage-level samples. The challenge first aspect automatic assurance; without a crowdsourcing paradigm effective, creation test corpora unacceptably...

10.1145/2483669.2483676 article EN ACM Transactions on Intelligent Systems and Technology 2013-06-01

Webis: An Ensemble for Twitter Sentiment Detection

OPENALEX - Publications

Matthias Hagen Martin Potthast Michel Büchner Benno Stein

We reproduce four Twitter sentiment classification approaches that participated in previous SemEval editions with diverse feature sets.The reproduced are combined an ensemble, averaging the individual classifiers' confidence scores for three classes (positive, neutral, negative) and deciding polarity based on these averages.The experimental evaluation Sem-Eval data shows our re-implementations to slightly outperform their respective originals.Moreover, not too surprisingly, ensemble of...

10.18653/v1/s15-2097 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2015-01-01

Coming Soon ...