- Topic Modeling
- Natural Language Processing Techniques
- Social Media and Politics
- Biomedical Text Mining and Ontologies
- Hate Speech and Cyberbullying Detection
- Misinformation and Its Impacts
- Speech and dialogue systems
- Machine Learning in Healthcare
- Populism, Right-Wing Movements
- Electronic Health Records Systems
- Speech Recognition and Synthesis
- Semantic Web and Ontologies
- Gender, Feminism, and Media
- Data Quality and Management
- Spam and Phishing Detection
- Opinion Dynamics and Social Influence
- Neural Networks and Applications
- Mental Health Research Topics
- Schizophrenia research and treatment
- Cybercrime and Law Enforcement Studies
- Mental Health and Psychiatry
- Terrorism, Counterterrorism, and Political Violence
- Wikis in Education and Collaboration
- Personal Information Management and User Behavior
- Government, Law, and Information Management
University of Sheffield
2011-2020
Linköping University
2003-2006
SRI International
2001
University of Cambridge
2000
Since the first RumourEval shared task in 2017, interest automated claim validation has greatly increased, as danger of “fake news” become a mainstream concern. However support for rumour verification remains its infancy. It is therefore important that this area continues to provide focus effort, which likely increase. Rumour characterised by need consider evolving conversations and news updates reach verdict on rumour’s veracity. As 2017 we provided dataset dubious posts ensuing social...
Objectives We sought to use natural language processing develop a suite of models capture key symptoms severe mental illness (SMI) from clinical text, facilitate the secondary healthcare data in research. Design Development and validation information extraction applications for ascertaining SMI routine health records using Clinical Record Interactive Search (CRIS) resource; description their distribution corpus discharge summaries. Setting Electronic large provider serving geographic...
Abstract Objective Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has potential to provide a step change in available for secondary research use, generation actionable medical insights, hospital management, trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search analytics tool EHRs. Methods SemEHR implements generic information extraction (IE) retrieval infrastructure by identifying...
Abstract This paper presents GATE Teamware—an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable interface functionalities, in order support the workflows and interactions that occur projects. Documents may be pre-processed automatically, so human annotators can begin has already been pre-annotated...
Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as significance modern economy expands in scope and permeates healthcare domain, there is an increasing urgency for organisations offer that address expectations clinicians, researchers business intelligence community alike. Amongst other emergent requirements, principal unmet need might be defined 3R principle (right data, right place, time) deficiencies...
Objectives To identify negative symptoms in the clinical records of a large sample patients with schizophrenia using natural language processing and assess their relationship outcomes. Design Observational study an anonymised electronic health record case register. Setting South London Maudsley NHS Trust (SLaM), provider inpatient community mental healthcare UK. Participants 7678 receiving care during 2011. Main outcome measures Hospital admission, readmission duration admission. Results 10...
Purpose This paper seeks to discuss reliability problems associated with questionnaires, commonly employed in library and information science. It aims focus on the effects of “common method variance” (CMV), which is a form bias, ways countering these effects. Design/methodology/approach The critically reviews use existing tools for demonstrating questionnaire‐based studies. In particular, it focuses Cronbach's alpha, “Harman's single factor test” Lindell Whitney's “marker variable” approach....
Risk assessment of suicidal behavior is a time-consuming but notoriously inaccurate activity for mental health services globally. In the last 50 years large number tools have been designed suicide risk assessment, and tested in wide variety populations, studies show that these suffer from low positive predictive values. More recently, advances research fields such as machine learning natural language processing applied on datasets shown promising results care, may enable an important shift...
Abstract The 2019 UK general election took place against a background of rising online hostility levels toward politicians, and concerns about the impact this on democracy, as record number politicians cited abuse they had been receiving reason for not standing re-election. We present four-factor framework in understanding who receives why. four factors are prominence, events, engagement personal characteristics. collected 4.2 million tweets sent to or from candidates six week period...
A UK-based online questionnaire investigating aspects of usage user-generated media (UGM), such as Facebook, LinkedIn and Twitter, attracted 587 participants. Results show a high degree engagement with social networking significant other professional media, microblogs blogs. Participants who experience information overload are those engage less frequently the rather than have fewer posts to read. Professional users different behaviours users. Microbloggers complain greatest extent. Two...
Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is exposure of politicians to online abuse. In paper we use 1.4 million tweets from months before 2015 and 2017 UK general elections explore abuse directed at politicians. Results show that increased substantially in compared with 2015. Abusive a strong relationship total received, indicating most part impersonality, but second pathway targets less prominent individuals,...
Previous work has demonstrated the success of statistical language models when enough training data is available [1], but despite that, grammar-based systems are proving preferred choice in successful commercial such as HeyAnita [2], BeVocal [3] and Tellme [4], largely due to difficulty involved obtaining a corpus data. Here we trained an SLM on obtained using system compared performance two with regards recognition. We also parsed output robust parser accuracy semantic systems. The...
Since its foundation in 2006, Twitter has enjoyed a meteoric rise popularity, currently boasting over 500 million users. Its short text nature means that the service is open to variety of different usage patterns, which have evolved rapidly terms user base and utilization. Prior work categorized T witter users, as well studied use lists re‐tweets how these can be used infer profiles interests. The focus this article on studying why users mark tweets “favorites”—a functionality with poorly...
Research into the automatic acquisition of subcategorization frames (SCFs) from corpora is starting to produce large-scale computational lexicons which include valuable frequency information. However, accuracy resulting shows room for improvement. One significant source error lies in statistical filtering used by some researchers remove noise automatically acquired frames. In this paper, we compare three different approaches out spurious hypotheses. Two hypothesis tests perform poorly,...
The Generalized Hebbian Algorithm is shown to be equivalent Latent Semantic Analysis, and applicable a range of LSAstyle tasks. GHA learning algorithm which converges on an approximation the eigen decomposition unseen frequency matrix given observations presented in sequence. Use allows very large datasets processed.
The rapidly expanding voice recognition industry has so far shown a preference for grammar-based language modelling, despite the better overall performance of statistical modelling. Given that advantages approach make it unlikely to be replaced as primary solution in near future, is natural wonder whether some combination two approaches may prove useful. Here, we describe an implemented system uses modelling and decision-tree classifier provide user with feedback when grammarbased fails....
Purpose The purpose of this paper is to describe: a new taxonomy metacognitive skills designed support the study metacognition in context web searching; data collection instrument based on taxonomy; and results testing sample university students staff. Design/methodology/approach review literature, extended cover searching. This forms basis for design instrument, which tested with 405 staff Sheffield University. Findings Subjects regard range focused as broadly similar. However, number...
This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event. Since first RumourEval shared task 2017, interest automated claim validation has greatly increased, dangers "fake news" have become a mainstream concern. Yet support rumour checking remains its infancy. For this reason, it important area continues to provide focus effort, likely increase. We therefore propose continuation veracity further rumours determined, and previously, supportive...
The rapid proliferation of microblogs such as Twitter has resulted in a vast quantity written text becoming available that contains interesting information for NLP tasks. However, the noise level tweets is so high standard tools perform poorly. In this pa- per, we present statistical truecaser using 3-gram language model built with truecased newswire texts and tweets. Our truecasing method shows an improvement named entity recognition part-of-speech tagging
Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as significance modern economy expands in scope and permeates healthcare domain, there is an increasing urgency for organisations offer that address expectations clinicians, researchers business intelligence community alike. Amongst other emergent requirements, principal unmet need might be defined 3R principle (right data, right place, time) deficiencies...