NFDI4DS | UHH-SEMS - Publication Details

Tomáš Horych

ORCID: 0009-0003-6456-2977

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020875075

Research Areas

Hate Speech and Cyberbullying Detection
Topic Modeling
Media Influence and Politics
Natural Language Processing Techniques
Speech Recognition and Synthesis
Computational and Text Analysis Methods
Misinformation and Its Impacts

Czech Technical University in Prague
2023

Introducing MBIB - The First Media Bias Identification Benchmark Task and Dataset Collection

OPENALEX - Publications

Martin Wessel Tomáš Horych Terry Ruas Akiko Aizawa Béla Gipp and 1 more

Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), comprehensive that groups different types of (e.g., linguistic, cognitive, political) under common framework test how prospective techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating techniques. evaluate MBIB using...

10.1145/3539618.3591882 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions

OPENALEX - Publications

Tomáš Horych Martin Wessel Jan Philip Wahle Terry Ruas Jerome Waßmuth and 4 more

Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media detection. enable at scale, present Large Bias Mixture (LBM), compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in on Annotation By Experts (BABE) dataset, with relative...

10.48550/arxiv.2403.07910 preprint EN arXiv (Cornell University) 2024-02-26

The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection

OPENALEX - Publications

Tomáš Horych Christian W. Mandl Terry Ruas André Greiner-Petter Béla Gipp and 2 more

High annotation costs from hiring or crowdsourcing complicate the creation of large, high-quality datasets needed for training reliable text classifiers. Recent research suggests using Large Language Models (LLMs) to automate process, reducing these while maintaining data quality. LLMs have shown promising results in annotating downstream tasks like hate speech detection and political framing. Building on success areas, this study investigates whether are viable complex task media bias a...

10.48550/arxiv.2411.11081 preprint EN arXiv (Cornell University) 2024-11-17

Coming Soon ...