- Misinformation and Its Impacts
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Computational Drug Discovery Methods
- Genomics and Rare Diseases
- Statistical Methods in Clinical Trials
- Network Security and Intrusion Detection
- Text and Document Classification Technologies
- Information and Cyber Security
- Hate Speech and Cyberbullying Detection
- Anomaly Detection Techniques and Applications
- Health Systems, Economic Evaluations, Quality of Life
- Bioinformatics and Genomic Networks
- Biosimilars and Bioanalytical Methods
- Media Influence and Politics
- Natural Language Processing Techniques
- Spam and Phishing Detection
- CRISPR and Genetic Engineering
- Authorship Attribution and Profiling
- Environmental and Ecological Studies
- Technology and Data Analysis
University of Sheffield
2023-2025
Open Targets
2020-2024
European Bioinformatics Institute
2020-2023
University of Cambridge
2017
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification prioritisation for drug discovery based upon underlying evidence. It is publicly available the code open source. Since our last update two years ago, we have had 10 releases maintain continuously improve evidence target-disease relationships from 20 different data sources. In addition, integrated new key datasets, including...
The Open Targets Platform (https://platform.opentargets.org/) is an open source resource to systematically assist drug target identification and prioritisation using publicly available data. Since our last update, we have reimagined, redesigned, rebuilt the in order streamline data integration harmonisation, expand ways which users can explore data, improve user experience. gene-disease causal evidence has been enhanced expanded better capture disease causality across rare, common, somatic...
Many drug discovery projects are started but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases chance of trial progression. Here, we applied natural language processing classify free-text reasons 28,561 stopped before their endpoints were met. We then evaluated these classes in light underlying evidence and target properties. found more likely stop because a lack efficacy absence strong genetic...
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements existing research by investigating how influence computation costs compared full fine-tuning. We focus specifically multilingual text tasks (genre, framing, persuasion detection; with different input lengths,...
Abstract Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity online content. Automating extraction credibility presents significant challenges due necessity training high-accuracy, signal-specific extractors, coupled with lack sufficiently large annotated datasets. This paper introduces Pastel ( P rompted we A k S upervision wi T h cr E dibility signa L s), weakly supervised approach that leverages language models...
Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity online content. Automating task credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors be trained, while there currently no sufficiently large datasets annotated with all signals. This paper investigates whether language models (LLMs) can prompted effectively set 18 produce weak labels for each...
Abstract Many drug discovery projects are started, but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases chance of trial progression. Here, we applied natural language processing classify freetext reasons 28,842 stopped before their endpoints were met. We then evaluated these classes in light underlying evidence and target properties. show more likely stop due lack efficacy absence strong...
Ben Wu, Olesya Razuvayevskaya, Freddy Heppell, João A. Leite, Carolina Scarton, Kalina Bontcheva, Xingyi Song. Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023). 2023.
Enthymeme reconstruction, i.e. the task of reformulating arguments with missing propositions, is an exciting at borderline text understanding and argument interpretation. However, there some doubt in community about feasibility this due to wide range possible reformulations that are open humans. We therefore believe research on how define objective ground truth for these tasks necessary before any work automatic reconstruction can begin. Here, we present a study finding expanding enthymemes...
Disinformation, irrespective of domain or language, aims to deceive manipulate public opinion, typically through employing advanced persuasion techniques. Qualitative and quantitative research on the weaponisation techniques in disinformation has been mostly topic-specific (e.g., COVID-19) with limited cross-domain studies, resulting a lack comprehensive understanding these strategies. This study employs state-of-the-art technique classifier conduct large-scale, multi-domain analysis role 16...
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy credible / less biased sources. It is sourced directly the debunk written by experts leading EUvsDisinfo project. Our largest to-date resource in terms overall number and distinct languages. also provides topical temporal coverage. Using this dataset, we investigate dissemination across different languages, uncovering language-specific patterns...
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy credible / less biased sources. It is sourced directly the debunk written by experts leading EUvsDisinfo project. Our largest to-date resource in terms overall number and distinct languages. also provides topical temporal coverage. Using this dataset, we investigate dissemination across different languages, uncovering language-specific patterns...
In the current era of social media and generative AI, an ability to automatically assess credibility online content is tremendous importance. Credibility assessment fundamentally based on aggregating signals, which refer small units information, such as factuality, bias, or a presence persuasion techniques, into overall score. signals provide more granular, easily explainable widely utilizable information in contrast currently predominant fake news detection, utilizes various (mostly latent)...
<title>Abstract</title> Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity online content. Automating extraction credibility presents significant challenges due necessity training high-accuracy, signal-specific extractors, coupled with lack sufficiently large annotated datasets. This paper introduces Pastel (Prompted weAk Supervision wiTh crEdibility signaLs), weakly supervised approach that leverages language...
This paper describes our approach for SemEval-2023 Task 3: Detecting the category, framing, and persuasion techniques in online news a multi-lingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained adapter mBERT models which was ranked joint-first German, had highest mean rank multi-language teams. 2 (Framing), achieved first place 3 languages, best average across all by using two separate ensembles: monolingual RoBERTa-MUPPETLARGE XLM-RoBERTaLARGE with adapters...
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements existing research by investigating how influence computation costs compared full when applied multilingual text tasks (genre, framing, persuasion detection; with different input lengths, number predicted classes...