- Algorithms and Data Compression
- Topic Modeling
- Disaster Response and Management
- Emergency and Acute Care Studies
- Data-Driven Disease Surveillance
- Advanced Data Storage Technologies
- Biomedical Text Mining and Ontologies
- Genomics and Phylogenetic Studies
- Machine Learning in Healthcare
- COVID-19 Clinical Research Studies
- Sepsis Diagnosis and Treatment
- Gene expression and cancer classification
- Data Quality and Management
- Natural Language Processing Techniques
- Meta-analysis and systematic reviews
- Long-Term Effects of COVID-19
- Plant nutrient uptake and metabolism
- Electronic Health Records Systems
- Machine Learning and Data Classification
- Domain Adaptation and Few-Shot Learning
- Technology and Data Analysis
- Fluid Dynamics Simulations and Interactions
- Lattice Boltzmann Simulation Studies
- Mental Health Research Topics
- Fluid Dynamics and Heat Transfer
University College London
2019-2022
National Health Service
2022
National Institute for Health Research
2022
Health Data Research UK
2020-2021
University of London
2020-2021
University College London Hospitals NHS Foundation Trust
2021
UCL Biomedical Research Centre
2020
Royal London Hospital
2020
Centro Nacional de Análisis Genómico
2014-2018
Centre for Genomic Regulation
2017-2018
The SARS-CoV-2 virus binds to the angiotensin-converting enzyme 2 (ACE2) receptor for cell entry. It has been suggested that inhibitors (ACEi) and angiotensin II blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise tissue ACE2 levels, could increase risk of severe COVID-19 infection.
Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the UK for risk stratification of COVID-19 patients, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 prediction outcome and identify validate a set blood physiological parameters routinely collected at hospital admission improve upon use alone medium-term stratification. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital Health Service...
Abstract Aims The SARS-Cov2 virus binds to the ACE2 receptor for cell entry. It has been suggested that ACE-inhibitors (ACEi) and Angiotensin-2 Blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise levels, could increase risk of severe COVID19 infection. Methods Results We evaluated this hypothesis a consecutive cohort 1200 acute inpatients at two hospitals multi-ethnic catchment population London (UK). mean age was 68±17 years (57% male) 74% had least 1...
Abstract Summary: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility results, one the most fundamental principles science. The widely used gzip compressor, reduction storage and transfer costs, not a perfect solution, so few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because slow processing, lack support some variants files or instability. We propose DSRC 2 that...
Abstract Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk based, where better these two, from Cox et al. (2012), is based on Burrows–Wheeler transform (BWT) and achieves 0.518 bits per base a 134.0 Gbp human genome...
Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records (EHR) may help, but much information free text rather than a computable form. We applied natural language processing (NLP) EHR data using CogStack platform simulate recruitment into LeoPARDS study,...
Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the United Kingdom for risk stratification of COVID outcomes, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 outcome and identify validate a set routinely-collected blood physiological parameters taken at hospital admission improve score. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital NHS Foundation Trust with COVID-19 disease from 1...
The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences compression permits user choose from several lossy modes improve overall ratio, depending on specific needs.FaStore in lossless mode achieves improvement ratio with respect...
The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning (ML) systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate...
Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk psychosis at large scale, by leveraging electronic health records (EHRs). This has been externally validated twice is undergoing feasibility testing clinical implementation. Integration this in routine should be facilitated prospective studies, which are required to address...
As more health care organizations transition to using electronic record (EHR) systems, it is important for these maximize the secondary use of their data support service improvement and clinical research. These will find challenging have systems capable harnessing unstructured fields in (clinical notes, letters, etc) practically such interact with all hospital (legacy current).
Biomedical documents such as Electronic Health Records (EHRs) contain a large amount of information in an unstructured format. The data EHRs is hugely valuable resource documenting clinical narratives and decisions, but whilst the text can be easily understood by human doctors it challenging to use research applications. To uncover potential biomedical we need extract structure they contain. task at hand Named Entity Recognition Linking (NER+L). number entities, ambiguity words, overlapping...
The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective sharing. Here we present CARGO (Compressed ARchiving GenOmics), a high-level framework automatically generate software systems optimized arbitrary types large genomic...
Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk psychosis at large scale, by leveraging electronic health records (EHRs). This has been externally validated twice is undergoing feasibility testing clinical implementation. Integration this in routine should be facilitated prospective studies, which are required to address...
The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate Motivated...
Abstract The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. proposed algorithm does not use any reference sequences compression, permits user choose from several lossy modes improve overall compression ratio, depending on specific needs. We demonstrate through extensive...
Abstract Clinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records has been explored as way identifying participants, but much information unstructured free text rather than computable form. We developed record pipeline that combines structured data...
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application Information Extraction (IE) technologies to enable clinical analysis. We present open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) feature-rich annotation interface customising and training IE models; c) integrations broader CogStack...
Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk-based~(Yanovsky, 2011; Cox et al., 2012), where better these two, from Cox~{\it al.}~(2012), is based on Burrows--Wheeler transform (BWT) and achieves 0.518 bits per...
In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented sample collision visualization. At end GPU solver performance advantage over CPU presented.
Transformer-based models have greatly advanced the progress in field of natural language processing and while they achieve state-of-the-art results on a wide range tasks, are cumbersome parameter size. Subsequently, even when pre-trained transformer used for fine-tuning given task, if dataset is large, it may still not be feasible to fine-tune model within reasonable time. For this reason, we empirically test 8 subsampling methods reducing size text classification task report trade-off...