NFDI4DS | UHH-SEMS - Publication Details

Arkaitz Zubiaga

ORCID: 0000-0003-4583-3623

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5071220716

Research Areas

Topic Modeling
Misinformation and Its Impacts
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Complex Network Analysis Techniques
Spam and Phishing Detection
Advanced Text Analysis Techniques
Natural Language Processing Techniques
Text and Document Classification Technologies
Social Media and Politics
Web Data Mining and Analysis
Wikis in Education and Collaboration
Opinion Dynamics and Social Influence
Internet Traffic Analysis and Secure E-voting
Software Engineering Research
Authorship Attribution and Profiling
Text Readability and Simplification
Advanced Graph Neural Networks
Biomedical Text Mining and Ontologies
Data Quality and Management
Data Visualization and Analytics
Recommender Systems and Techniques
Multimodal Machine Learning Applications
Digital Communication and Language
Korean Peninsula Historical and Political Studies

Queen Mary University of London
2018-2024

Universidad de Londres
2020-2023

City University of New York
2012-2021

University of Southern California
2020

University of Warwick
2015-2019

Swiss National Science Foundation
2017

University College Dublin
2015

Queens College, CUNY
2012-2013

National University of Distance Education
2007-2012

New York University
2012

Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads

OPENALEX - Publications

Arkaitz Zubiaga Maria Liakata Rob Procter Geraldine Wong Sak Hoi Peter Tolmie

As breaking news unfolds people increasingly rely on social media to stay abreast of the latest updates. The use in such situations comes with caveat that new information being released piecemeal may encourage rumours, many which remain unverified long after their point release. Little is known, however, about dynamics life cycle a rumour. In this paper we present methodology has enabled us collect, identify and annotate dataset 330 rumour threads (4,842 tweets) associated 9 newsworthy...

10.1371/journal.pone.0150989 article EN cc-by PLoS ONE 2016-03-04

SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

OPENALEX - Publications

Leon Derczynski Kalina Bontcheva Maria Liakata Rob Procter Geraldine Wong Sak Hoi and 1 more

Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word 2016. This makes it more important than ever to build systems that can identify veracity a story, and nature discourse around it. RumourEval SemEval shared task aims handle rumours reactions them, in text. We present an annotation scheme, large dataset covering multiple topics – each having their own families claims replies use these pose two concrete challenges well results achieved by participants on challenges.

10.18653/v1/s17-2006 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours

OPENALEX - Publications

Genevieve Gorrell Elena Kochkina Maria Liakata Ahmet Aker Arkaitz Zubiaga and 2 more

Since the first RumourEval shared task in 2017, interest automated claim validation has greatly increased, as danger of “fake news” become a mainstream concern. However support for rumour verification remains its infancy. It is therefore important that this area continues to provide focus effort, which likely increase. Rumour characterised by need consider evolving conversations and news updates reach verdict on rumour’s veracity. As 2017 we provided dataset dubious posts ensuing social...

10.18653/v1/s19-2147 article EN 2019-01-01

Accelerated DP based search for statistical translation

OPENALEX - Publications

Christoph Tillmann S. Vogel Hermann Ney Arkaitz Zubiaga Hassan Sawaf

In this paper, we describe a fast search algorithm for statistical translation based on dynamic programming (DP) and present experimental results.The approach is the assumption that word alignment monotone with respect to order in both languages.To reduce e ort approach, introduce two methods: an acceleration technique ciently compute recursion equation beam strategy as used speech recognition.The tests carried out Verbmobil corpus showed space, measured by number of hypotheses, reduced...

10.21437/eurospeech.1997-673 article EN 1997-09-22

Real‐time classification of Twitter trends

OPENALEX - Publications

Arkaitz Zubiaga Damiano Spina Raquel Martínez Víctor Fresno

In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following 4 types: news , ongoing events memes and commemoratives . While previous research has analyzed trending topics over long term, look at earliest tweets produce trend, aim categorizing early on. This allows us to provide filtered subset end users. We experiment set straightforward language‐independent features based social spread categorize them using typology. Our method provides...

10.1002/asi.23186 article EN Journal of the Association for Information Science and Technology 2014-05-09

Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media

OPENALEX - Publications

Arkaitz Zubiaga Maria Liakata Rob Procter

Breaking news leads to situations of fast-paced reporting in social media, producing all kinds updates related stories, albeit with the caveat that some those early tend be rumours, i.e., information an unverified status at time posting. Flagging is can helpful avoid spread may turn out false. Detection rumours also feed a rumour tracking system ultimately determines their veracity. In this paper we introduce novel approach detection learns from sequential dynamics during breaking media...

10.48550/arxiv.1610.07363 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Discourse-aware rumour stance classification in social media using sequential classifiers

OPENALEX - Publications

Arkaitz Zubiaga Elena Kochkina Maria Liakata Rob Procter Michał Łukasik and 3 more

10.1016/j.ipm.2017.11.009 article EN Information Processing & Management 2017-12-06

Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter

OPENALEX - Publications

Michał Łukasik P. K. Srijith Duy Vu Kalina Bontcheva Arkaitz Zubiaga and 1 more

Michal Lukasik, P. K. Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, Trevor Cohn. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2016.

10.18653/v1/p16-2064 article EN cc-by 2016-01-01

Tweet, but verify: epistemic study of information verification on Twitter

OPENALEX - Publications

Arkaitz Zubiaga Heng Ji

10.1007/s13278-014-0163-y article EN Social Network Analysis and Mining 2014-03-24

All-in-one: Multi-task Learning for Rumour Verification

OPENALEX - Publications

Elena Kochkina Maria Liakata Arkaitz Zubiaga

Automatic resolution of rumours is a challenging task that can be broken down into smaller components make up pipeline, including rumour detection, tracking and stance classification, leading to the final outcome determining veracity rumour. In previous work, these steps in process verification have been developed as separate where output one feeds next. We propose multi-task learning approach allows joint training main auxiliary tasks, improving performance verification. examine connection...

10.48550/arxiv.1806.03713 preprint EN cc-by arXiv (Cornell University) 2018-01-01

Automated fact‐checking: A survey

OPENALEX - Publications

Xia Zeng Amani S. Abumansour Arkaitz Zubiaga

Abstract As online false information continues to grow, automated fact‐checking has gained an increasing amount of attention in recent years. Researchers the field Natural Language Processing (NLP) have contributed task by building datasets, devising pipelines and proposing NLP methods further research development different components. This article reviews relevant on covering both claim detection validation

10.1111/lnc3.12438 article EN cc-by Language and Linguistics Compass 2021-10-01

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover

OPENALEX - Publications

Leon Fröhling Arkaitz Zubiaga

The recent improvements of language models have drawn much attention to potential cases use and abuse automatically generated text. Great effort is put into the development methods detect machine generations among human-written text in order avoid scenarios which large-scale generation with minimal cost undermines trust human interaction factual information online. While most current approaches rely on availability expensive models, we propose a simple feature-based classifier for detection...

10.7717/peerj-cs.443 article EN cc-by PeerJ Computer Science 2021-04-06

Citizen Participation and Machine Learning for a Better Democracy

OPENALEX - Publications

Miguel Arana‐Catania Felix-Anselm van Lier Rob Procter Nataliya Tkachenko Yulan He and 2 more

The development of democratic systems is a crucial task as confirmed by its selection one the Millennium Sustainable Development Goals United Nations. In this article, we report on progress project that aims to address barriers, which information overload, achieving effective direct citizen participation in decision-making processes. main objectives are explore if application Natural Language Processing ( NLP ) and machine learning can improve citizens’ experience digital platforms. Taking...

10.1145/3452118 article EN Digital Government Research and Practice 2021-05-04

Toward Automated Factchecking

OPENALEX - Publications

Lev Konstantinovskiy Oliver R. Price Mevan Babakar Arkaitz Zubiaga

In an effort to assist factcheckers in the process of factchecking, we tackle claim detection task, one necessary stages prior determining veracity a claim. It consists identifying set sentences, out long text, deemed capable being factchecked. This article is collaborative work between Full Fact, independent factchecking charity, and academic partners. Leveraging expertise professional factcheckers, develop annotation schema benchmark for automated that more consistent across time, topics,...

10.1145/3412869 article EN Digital Threats Research and Practice 2021-04-15

Session-based cyberbullying detection in social media: A survey

OPENALEX - Publications

Peiling Yi Arkaitz Zubiaga

Cyberbullying is a pervasive problem in online social media, where bully abuses victim through media session. By investigating cyberbullying perpetrated sessions, recent research has looked into mining patterns and features for modelling understanding the two defining characteristics of cyberbullying: repetitive behaviour power imbalance. In this survey paper, we define framework that encapsulates four different steps session-based detection should go through, discuss multiple challenges...

10.1016/j.osnem.2023.100250 article EN cc-by Online Social Networks and Media 2023-06-17

Towards real-time summarization of scheduled events from twitter streams

OPENALEX - Publications

Arkaitz Zubiaga Damiano Spina Enrique Amigó Julio Gonzalo

We deal with shrinking the stream of tweets for scheduled events in real-time, following two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, picks a to describe each sub-event. By comparing summaries three languages live reports by journalists, we show that simple text analysis methods do not involve external knowledge lead cover 84% sub-events on average, 100% key types (such as goals soccer).

10.1145/2309996.2310053 article EN 2012-06-25

Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

OPENALEX - Publications

Bo Wang Arkaitz Zubiaga Maria Liakata Rob Procter

Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques detection have focused primarily identification accounts by using extensive historical network-based data. In this paper we focus tweets, optimises needs to be gathered relying only tweet-inherent features. This enables application system large set tweets in timely fashion, potentially...

10.48550/arxiv.1503.07405 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

OPENALEX - Publications

Arkaitz Zubiaga Elena Kochkina Maria Liakata Rob Procter Michał Łukasik

Rumour stance classification, the task that determines if each tweet in a collection discussing rumour is supporting, denying, questioning or simply commenting on rumour, has been attracting substantial interest. Here we introduce novel approach makes use of sequence transitions observed tree-structured conversation threads Twitter. The are formed by harvesting users' replies to one another, which results nested tree-like structure. Previous work addressing classification treated as separate...

10.48550/arxiv.1609.09028 preprint EN cc-by arXiv (Cornell University) 2016-01-01

A longitudinal assessment of the persistence of twitter datasets

OPENALEX - Publications

Arkaitz Zubiaga

Social media datasets are not always completely replicable. Having to adhere requirements of platforms such as Twitter, researchers can only release a list unique identifiers, which others then use recollect the data themselves. This leads subsets no longer being available, content be deleted or user accounts deactivated. To quantify long‐term impact this in replicability datasets, we perform longitudinal analysis persistence 30 Twitter include more than 147 million tweets. By recollecting...

10.1002/asi.24026 article EN Journal of the Association for Information Science and Technology 2018-05-14

Coming Soon ...