- Biomedical Text Mining and Ontologies
- Topic Modeling
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Machine Learning in Healthcare
- Bioinformatics and Genomic Networks
- Computational Drug Discovery Methods
- Electronic Health Records Systems
- Text Readability and Simplification
- Social Media in Health Education
- Genomics and Phylogenetic Studies
- Pharmacovigilance and Adverse Drug Reactions
- Scientific Computing and Data Management
- linguistics and terminology studies
- Genetics, Bioinformatics, and Biomedical Research
- Intimate Partner and Family Violence
- Mental Health via Writing
- Health Literacy and Information Accessibility
- Ethics in Clinical Research
- Artificial Intelligence in Healthcare
- Web Data Mining and Analysis
- Machine Learning in Bioinformatics
- Patient-Provider Communication in Healthcare
- Psychopathy, Forensic Psychiatry, Sexual Offending
University of Manchester
2016-2025
UNSW Sydney
2024
The Alan Turing Institute
2019-2021
Open Text (Canada)
2019-2021
Turing Institute
2020
Manchester University
2019
Cancer Research UK Manchester Institute
2013-2017
Serbian Academy of Sciences and Arts
2017
Farr Institute
2015-2017
Manchester Academic Health Science Centre
2014-2017
Abstract Background The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number applications text data mining, including gene name recognition, species-specific document retrieval, semantic enrichment articles. Results In this paper we describe an open-source recognition normalization software system, LINNAEUS, evaluate its performance relative to several automatically generated corpora, well novel corpus full-text...
Background: Use of the social media website Twitter is highly prevalent and has led to a plethora Web-based health-related data available for use by researchers. As such, researchers are increasingly using from retrieve analyze mental content. However, there limited evidence regarding why people this emerging platform discuss health problems in first place. Objectives: The aim study was explore reasons individuals on Twitter. its kind implement study-specific hashtag research; therefore, we...
In animal-based biomedical research, both the sex and age of animals studied affect disease phenotypes by modifying their susceptibility, presentation response to treatment. The accurate reporting experimental methods materials, including animals, is essential so that other researchers can build on results such studies. Here we use text mining study 15,311 research papers in which mice were focus study. We find percentage has increased over past two decades: however, only about 50% published...
Abstract Objective We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable community-driven development and large-scale evaluation of automatic text processing methods classification normalization health-related from social media. An additional objective was publicly release manually annotated data. Materials Methods organized 3 independent subtasks: self-reports 1) adverse drug reactions (ADRs) 2) medication consumption, medication-mentioning tweets, 3) ADR...
Abstract Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text data mining pipelines. Despite this need, very few entity normalization systems are publicly available source code web services for biomedical mining. Here we present the Gnat Java library retrieval, recognition, gene protein text. The can be used a component integrated with other text-mining systems, framework add...
OBJECTIVE The authors present a system developed for the Challenge in Natural Language Processing Clinical Data-the i2b2 obesity challenge, whose aim was to automatically identify status of and 15 related co-morbidities patients using their clinical discharge summaries. challenge consisted two tasks, textual intuitive. task explicit references diseases, whereas intuitive focused on prediction disease when evidence not explicitly asserted. DESIGN assembled set resources lexically semantically...
Identification of clinical events (eg, problems, tests, treatments) and associated temporal expressions dates times) are key tasks in extracting managing data from electronic health records. As part the i2b2 2012 Natural Language Processing for Clinical Data challenge, we developed evaluated a system to automatically extract narratives. The extracted were additionally normalized by assigning type, value, modifier.The combines rule-based machine learning approaches that rely on morphological,...
Background Use of routinely collected patient data for research and service planning is an explicit policy the UK National Health Service government. Much clinical information recorded in free-text letters, reports notes. These text are generally lost to research, due increased privacy risk compared with structured data. We conducted a citizens’ jury which asked members public whether their medical should be shared benefit, inform ethical policy. Methods Eighteen citizens took part over 3...
Clinical text and documents contain very rich information knowledge in healthcare, their processing using state-of-the-art language technology becomes important for building intelligent systems supporting healthcare social good. This includes creating understanding models translating resources into other natural languages to share domain-specific cross-lingual knowledge. In this work, we conduct investigations on clinical machine translation by examining multilingual neural network deep...
Effective Big Data Mining requires scalable and efficient solutions that are also accessible to users of all levels expertise. Despite this, many current efforts provide effective knowledge extraction via large-scale tools focus more on performance than use tuning which complex problems even for experts. Weka is a popular comprehensive workbench with well-known intuitive interface, nonetheless it supports only sequential single-node execution. Hence, the size datasets processing tasks can...
A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of notes, which includes identification mentions Protected Health Information (PHI). We describe methods developed and evaluated as part i2b2/UTHealth 2014 challenge identify PHI defined by 25 entity types longitudinal narratives. Our approach combines knowledge-driven (dictionaries rules) data-driven (machine learning) with a large...
Although the amount of data in biology is rapidly increasing, critical information for understanding biological events like phosphorylation or gene expression remains locked biomedical literature. Most current text mining (TM) approaches to extract about are focused on either limited-scale studies and/or abstracts, with extracted lacking context and rarely available support further research.Here we present BioContext, an integrated TM system which extracts, extends integrates results from a...
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics use, described within the biomedical literature, little work date has provided evaluation full range availability or levels usage database software resources. Here we use text mining process PubMed Central full-text corpus, identifying mentions databases scientific literature. We provide audit contained a comparison their relative...
Mobile application (app) websites such as Google Play and AppStore allow users to review their downloaded apps. Such reviews can be useful for app users, they may help make an informed decision; also potentially developers, if contain valuable information concerning user needs requirements. However, in order unleash the value of mobile development, intelligent mining tools that discern relevant from irrelevant ones must provided. This paper surveys state art development techniques behind...
Abstract Important clinical information is recorded in free text patients’ records, notes, letters and reports healthcare settings. This currently under-used for health research innovation. Free requires more processing analysis than structured data, but natural language at scale has recently advanced, using large models. However, data controllers are often concerned about patient privacy risks if allowed to be used research. Text can de-identified, yet it challenging quantify the residual...
Objective: This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing Clinical Data, whose aim was to automatically extract certain information about medications used by patient from his/her medical report. The following each medication: name, dosage, mode/route, frequency, duration and reason. Design: implements rule-based methodology, which exploits typical morphological, lexical, syntactic semantic features of targeted information. These were...
Background The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication biomedical articles. Despite fact that genome data publications are most heavily relied-upon sources for many biologists, very little effort has been made to systematically integrate from sequences directly with biological literature. For limited number model organisms dedicated teams manually curate about genes; however species no such staff thousands...
In recent years, social media websites have been suggested as a novel, vast source of data which may be useful for deriving drug safety information. Despite this, there are few published reports profiles derived in this way. The aims study were to detect and quantify glucocorticoid-related adverse events using computerised system automated detection suspected reactions (ADR) from narrative text Twitter, compare the frequency specific ADR mentions within Twitter patterns spontaneous reporting...