Łukasz Roguski

ORCID: 0000-0003-2764-962X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Algorithms and Data Compression
  • Topic Modeling
  • Disaster Response and Management
  • Emergency and Acute Care Studies
  • Data-Driven Disease Surveillance
  • Advanced Data Storage Technologies
  • Biomedical Text Mining and Ontologies
  • Genomics and Phylogenetic Studies
  • Machine Learning in Healthcare
  • COVID-19 Clinical Research Studies
  • Sepsis Diagnosis and Treatment
  • Gene expression and cancer classification
  • Data Quality and Management
  • Natural Language Processing Techniques
  • Meta-analysis and systematic reviews
  • Long-Term Effects of COVID-19
  • Plant nutrient uptake and metabolism
  • Electronic Health Records Systems
  • Machine Learning and Data Classification
  • Domain Adaptation and Few-Shot Learning
  • Technology and Data Analysis
  • Fluid Dynamics Simulations and Interactions
  • Lattice Boltzmann Simulation Studies
  • Mental Health Research Topics
  • Fluid Dynamics and Heat Transfer

University College London
2019-2022

National Health Service
2022

National Institute for Health Research
2022

Health Data Research UK
2020-2021

University of London
2020-2021

University College London Hospitals NHS Foundation Trust
2021

UCL Biomedical Research Centre
2020

Royal London Hospital
2020

Centro Nacional de Análisis Genómico
2014-2018

Centre for Genomic Regulation
2017-2018

The SARS-CoV-2 virus binds to the angiotensin-converting enzyme 2 (ACE2) receptor for cell entry. It has been suggested that inhibitors (ACEi) and angiotensin II blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise tissue ACE2 levels, could increase risk of severe COVID-19 infection.

10.1002/ejhf.1924 article EN cc-by European Journal of Heart Failure 2020-06-01

Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the UK for risk stratification of COVID-19 patients, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 prediction outcome and identify validate a set blood physiological parameters routinely collected at hospital admission improve upon use alone medium-term stratification. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital Health Service...

10.1186/s12916-020-01893-3 article EN cc-by BMC Medicine 2021-01-21

Abstract Aims The SARS-Cov2 virus binds to the ACE2 receptor for cell entry. It has been suggested that ACE-inhibitors (ACEi) and Angiotensin-2 Blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise levels, could increase risk of severe COVID19 infection. Methods Results We evaluated this hypothesis a consecutive cohort 1200 acute inpatients at two hospitals multi-ethnic catchment population London (UK). mean age was 68±17 years (57% male) 74% had least 1...

10.1101/2020.04.07.20056788 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2020-04-11

Abstract Summary: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility results, one the most fundamental principles science. The widely used gzip compressor, reduction storage and transfer costs, not a perfect solution, so few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because slow processing, lack support some variants files or instability. We propose DSRC 2 that...

10.1093/bioinformatics/btu208 article EN Bioinformatics 2014-04-18

Abstract Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk based, where better these two, from Cox et al. (2012), is based on Burrows–Wheeler transform (BWT) and achieves 0.518 bits per base a 134.0 Gbp human genome...

10.1093/bioinformatics/btu844 article EN Bioinformatics 2014-12-22

Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records (EHR) may help, but much information free text rather than a computable form. We applied natural language processing (NLP) EHR data using CogStack platform simulate recruitment into LeoPARDS study,...

10.1109/jbhi.2020.2977925 article EN IEEE Journal of Biomedical and Health Informatics 2020-03-10

Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the United Kingdom for risk stratification of COVID outcomes, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 outcome and identify validate a set routinely-collected blood physiological parameters taken at hospital admission improve score. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital NHS Foundation Trust with COVID-19 disease from 1...

10.1101/2020.04.24.20078006 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2020-04-29

The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences compression permits user choose from several lossy modes improve overall ratio, depending on specific needs.FaStore in lossless mode achieves improvement ratio with respect...

10.1093/bioinformatics/bty205 article EN Bioinformatics 2018-03-27

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning (ML) systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate...

10.1016/j.nlp.2024.100076 article EN cc-by-nc-nd Natural Language Processing Journal 2024-04-30

Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk psychosis at large scale, by leveraging electronic health records (EHRs). This has been externally validated twice is undergoing feasibility testing clinical implementation. Integration this in routine should be facilitated prospective studies, which are required to address...

10.3791/60794 article EN Journal of Visualized Experiments 2020-05-15

As more health care organizations transition to using electronic record (EHR) systems, it is important for these maximize the secondary use of their data support service improvement and clinical research. These will find challenging have systems capable harnessing unstructured fields in (clinical notes, letters, etc) practically such interact with all hospital (legacy current).

10.2196/38122 article EN cc-by JMIR Medical Informatics 2022-08-24

Biomedical documents such as Electronic Health Records (EHRs) contain a large amount of information in an unstructured format. The data EHRs is hugely valuable resource documenting clinical narratives and decisions, but whilst the text can be easily understood by human doctors it challenging to use research applications. To uncover potential biomedical we need extract structure they contain. task at hand Named Entity Recognition Linking (NER+L). number entities, ambiguity words, overlapping...

10.48550/arxiv.1912.10166 preprint EN other-oa arXiv (Cornell University) 2019-01-01

The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective sharing. Here we present CARGO (Compressed ARchiving GenOmics), a high-level framework automatically generate software systems optimized arbitrary types large genomic...

10.1093/nar/gkw318 article EN cc-by Nucleic Acids Research 2016-04-29

Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk psychosis at large scale, by leveraging electronic health records (EHRs). This has been externally validated twice is undergoing feasibility testing clinical implementation. Integration this in routine should be facilitated prospective studies, which are required to address...

10.3791/60794-v article EN Journal of Visualized Experiments 2020-05-15

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate Motivated...

10.48550/arxiv.2308.08234 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Abstract The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. proposed algorithm does not use any reference sequences compression, permits user choose from several lossy modes improve overall compression ratio, depending on specific needs. We demonstrate through extensive...

10.1101/168096 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2017-07-25

Abstract Clinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records has been explored as way identifying participants, but much information unstructured free text rather than computable form. We developed record pipeline that combines structured data...

10.1101/19005603 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2019-09-09

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application Information Extraction (IE) technologies to enable clinical analysis. We present open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) feature-rich annotation interface customising and training IE models; c) integrations broader CogStack...

10.48550/arxiv.2010.01165 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk-based~(Yanovsky, 2011; Cox et al., 2012), where better these two, from Cox~{\it al.}~(2012), is based on Burrows--Wheeler transform (BWT) and achieves 0.518 bits per...

10.48550/arxiv.1405.6874 preprint EN other-oa arXiv (Cornell University) 2014-01-01

In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented sample collision visualization. At end GPU solver performance advantage over CPU presented.

10.21936/si2013_v34.n1.5 article EN Studia Informatica System and information technology 2013-02-09

Transformer-based models have greatly advanced the progress in field of natural language processing and while they achieve state-of-the-art results on a wide range tasks, are cumbersome parameter size. Subsequently, even when pre-trained transformer used for fine-tuning given task, if dataset is large, it may still not be feasible to fine-tune model within reasonable time. For this reason, we empirically test 8 subsampling methods reducing size text classification task report trade-off...

10.18653/v1/2021.sustainlp-1.11 article EN cc-by 2021-01-01
Coming Soon ...