- Topic Modeling
- Misinformation and Its Impacts
- Hate Speech and Cyberbullying Detection
- Sentiment Analysis and Opinion Mining
- Complex Network Analysis Techniques
- Spam and Phishing Detection
- Advanced Text Analysis Techniques
- Natural Language Processing Techniques
- Text and Document Classification Technologies
- Social Media and Politics
- Web Data Mining and Analysis
- Wikis in Education and Collaboration
- Opinion Dynamics and Social Influence
- Internet Traffic Analysis and Secure E-voting
- Software Engineering Research
- Authorship Attribution and Profiling
- Text Readability and Simplification
- Advanced Graph Neural Networks
- Biomedical Text Mining and Ontologies
- Data Quality and Management
- Data Visualization and Analytics
- Recommender Systems and Techniques
- Multimodal Machine Learning Applications
- Digital Communication and Language
- Korean Peninsula Historical and Political Studies
Queen Mary University of London
2018-2024
Universidad de Londres
2020-2023
City University of New York
2012-2021
University of Southern California
2020
University of Warwick
2015-2019
Swiss National Science Foundation
2017
University College Dublin
2015
Queens College, CUNY
2012-2013
National University of Distance Education
2007-2012
New York University
2012
As breaking news unfolds people increasingly rely on social media to stay abreast of the latest updates. The use in such situations comes with caveat that new information being released piecemeal may encourage rumours, many which remain unverified long after their point release. Little is known, however, about dynamics life cycle a rumour. In this paper we present methodology has enabled us collect, identify and annotate dataset 330 rumour threads (4,842 tweets) associated 9 newsworthy...
Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word 2016. This makes it more important than ever to build systems that can identify veracity a story, and nature discourse around it. RumourEval SemEval shared task aims handle rumours reactions them, in text. We present an annotation scheme, large dataset covering multiple topics – each having their own families claims replies use these pose two concrete challenges well results achieved by participants on challenges.
Since the first RumourEval shared task in 2017, interest automated claim validation has greatly increased, as danger of “fake news” become a mainstream concern. However support for rumour verification remains its infancy. It is therefore important that this area continues to provide focus effort, which likely increase. Rumour characterised by need consider evolving conversations and news updates reach verdict on rumour’s veracity. As 2017 we provided dataset dubious posts ensuing social...
In this paper, we describe a fast search algorithm for statistical translation based on dynamic programming (DP) and present experimental results.The approach is the assumption that word alignment monotone with respect to order in both languages.To reduce e ort approach, introduce two methods: an acceleration technique ciently compute recursion equation beam strategy as used speech recognition.The tests carried out Verbmobil corpus showed space, measured by number of hypotheses, reduced...
In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following 4 types: news , ongoing events memes and commemoratives . While previous research has analyzed trending topics over long term, look at earliest tweets produce trend, aim categorizing early on. This allows us to provide filtered subset end users. We experiment set straightforward language‐independent features based social spread categorize them using typology. Our method provides...
Breaking news leads to situations of fast-paced reporting in social media, producing all kinds updates related stories, albeit with the caveat that some those early tend be rumours, i.e., information an unverified status at time posting. Flagging is can helpful avoid spread may turn out false. Detection rumours also feed a rumour tracking system ultimately determines their veracity. In this paper we introduce novel approach detection learns from sequential dynamics during breaking media...
Michal Lukasik, P. K. Srijith, Duy Vu, Kalina Bontcheva, Arkaitz Zubiaga, Trevor Cohn. Proceedings of the 54th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2016.
Automatic resolution of rumours is a challenging task that can be broken down into smaller components make up pipeline, including rumour detection, tracking and stance classification, leading to the final outcome determining veracity rumour. In previous work, these steps in process verification have been developed as separate where output one feeds next. We propose multi-task learning approach allows joint training main auxiliary tasks, improving performance verification. examine connection...
Abstract As online false information continues to grow, automated fact‐checking has gained an increasing amount of attention in recent years. Researchers the field Natural Language Processing (NLP) have contributed task by building datasets, devising pipelines and proposing NLP methods further research development different components. This article reviews relevant on covering both claim detection validation
The recent improvements of language models have drawn much attention to potential cases use and abuse automatically generated text. Great effort is put into the development methods detect machine generations among human-written text in order avoid scenarios which large-scale generation with minimal cost undermines trust human interaction factual information online. While most current approaches rely on availability expensive models, we propose a simple feature-based classifier for detection...
The development of democratic systems is a crucial task as confirmed by its selection one the Millennium Sustainable Development Goals United Nations. In this article, we report on progress project that aims to address barriers, which information overload, achieving effective direct citizen participation in decision-making processes. main objectives are explore if application Natural Language Processing ( NLP ) and machine learning can improve citizens’ experience digital platforms. Taking...
In an effort to assist factcheckers in the process of factchecking, we tackle claim detection task, one necessary stages prior determining veracity a claim. It consists identifying set sentences, out long text, deemed capable being factchecked. This article is collaborative work between Full Fact, independent factchecking charity, and academic partners. Leveraging expertise professional factcheckers, develop annotation schema benchmark for automated that more consistent across time, topics,...
Cyberbullying is a pervasive problem in online social media, where bully abuses victim through media session. By investigating cyberbullying perpetrated sessions, recent research has looked into mining patterns and features for modelling understanding the two defining characteristics of cyberbullying: repetitive behaviour power imbalance. In this survey paper, we define framework that encapsulates four different steps session-based detection should go through, discuss multiple challenges...
We deal with shrinking the stream of tweets for scheduled events in real-time, following two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, picks a to describe each sub-event. By comparing summaries three languages live reports by journalists, we show that simple text analysis methods do not involve external knowledge lead cover 84% sub-events on average, 100% key types (such as goals soccer).
Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques detection have focused primarily identification accounts by using extensive historical network-based data. In this paper we focus tweets, optimises needs to be gathered relying only tweet-inherent features. This enables application system large set tweets in timely fashion, potentially...
Rumour stance classification, the task that determines if each tweet in a collection discussing rumour is supporting, denying, questioning or simply commenting on rumour, has been attracting substantial interest. Here we introduce novel approach makes use of sequence transitions observed tree-structured conversation threads Twitter. The are formed by harvesting users' replies to one another, which results nested tree-like structure. Previous work addressing classification treated as separate...
Social media datasets are not always completely replicable. Having to adhere requirements of platforms such as Twitter, researchers can only release a list unique identifiers, which others then use recollect the data themselves. This leads subsets no longer being available, content be deleted or user accounts deactivated. To quantify long‐term impact this in replicability datasets, we perform longitudinal analysis persistence 30 Twitter include more than 147 million tweets. By recollecting...