NFDI4DS | UHH-SEMS - Publication Details

Łukasz Roguski

ORCID: 0000-0003-2764-962X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5076395273

Research Areas

Algorithms and Data Compression
Topic Modeling
Disaster Response and Management
Emergency and Acute Care Studies
Data-Driven Disease Surveillance
Advanced Data Storage Technologies
Biomedical Text Mining and Ontologies
Genomics and Phylogenetic Studies
Machine Learning in Healthcare
COVID-19 Clinical Research Studies
Sepsis Diagnosis and Treatment
Gene expression and cancer classification
Data Quality and Management
Natural Language Processing Techniques
Meta-analysis and systematic reviews
Long-Term Effects of COVID-19
Plant nutrient uptake and metabolism
Electronic Health Records Systems
Machine Learning and Data Classification
Domain Adaptation and Few-Shot Learning
Technology and Data Analysis
Fluid Dynamics Simulations and Interactions
Lattice Boltzmann Simulation Studies
Mental Health Research Topics
Fluid Dynamics and Heat Transfer

University College London
2019-2022

National Health Service
2022

National Institute for Health Research
2022

Health Data Research UK
2020-2021

University of London
2020-2021

University College London Hospitals NHS Foundation Trust
2021

UCL Biomedical Research Centre
2020

Royal London Hospital
2020

Centro Nacional de Análisis Genómico
2014-2018

Centre for Genomic Regulation
2017-2018

Angiotensin‐converting enzyme inhibitors and angiotensin II receptor blockers are not associated with severe COVID‐19 infection in a multi‐site UK acute hospital trust

OPENALEX - Publications

Daniel Bean Željko Kraljević Thomas Searle Rebecca Bendayan Kevin O’Gallagher and 9 more

The SARS-CoV-2 virus binds to the angiotensin-converting enzyme 2 (ACE2) receptor for cell entry. It has been suggested that inhibitors (ACEi) and angiotensin II blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise tissue ACE2 levels, could increase risk of severe COVID-19 infection.

10.1002/ejhf.1924 article EN cc-by European Journal of Heart Failure 2020-06-01

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit

OPENALEX - Publications

Željko Kraljević Thomas Searle Anthony Shek Łukasz Roguski Kawsar Noor and 13 more

10.1016/j.artmed.2021.102083 article EN Artificial Intelligence in Medicine 2021-05-01

Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study

OPENALEX - Publications

Ewan Carr Rebecca Bendayan Daniel Bean Matt Stammers Wenjuan Wang and 43 more

Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the UK for risk stratification of COVID-19 patients, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 prediction outcome and identify validate a set blood physiological parameters routinely collected at hospital admission improve upon use alone medium-term stratification. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital Health Service...

10.1186/s12916-020-01893-3 article EN cc-by BMC Medicine 2021-01-21

ACE-inhibitors and Angiotensin-2 Receptor Blockers are not associated with severe SARS-COVID19 infection in a multi-site UK acute Hospital Trust

OPENALEX - Publications

Daniel Bean Željko Kraljević Thomas Searle Rebecca Bendayan Kevin O’Gallagher and 9 more

Abstract Aims The SARS-Cov2 virus binds to the ACE2 receptor for cell entry. It has been suggested that ACE-inhibitors (ACEi) and Angiotensin-2 Blockers (ARB), which are commonly used in patients with hypertension or diabetes may raise levels, could increase risk of severe COVID19 infection. Methods Results We evaluated this hypothesis a consecutive cohort 1200 acute inpatients at two hospitals multi-ethnic catchment population London (UK). mean age was 68±17 years (57% male) 74% had least 1...

10.1101/2020.04.07.20056788 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2020-04-11

DSRC 2—Industry-oriented compression of FASTQ files

OPENALEX - Publications

Łukasz Roguski Sebastian Deorowicz

Abstract Summary: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility results, one the most fundamental principles science. The widely used gzip compressor, reduction storage and transfer costs, not a perfect solution, so few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because slow processing, lack support some variants files or instability. We propose DSRC 2 that...

10.1093/bioinformatics/btu208 article EN Bioinformatics 2014-04-18

Disk-based compression of data from genome sequencing

OPENALEX - Publications

Szymon Grabowski Sebastian Deorowicz Łukasz Roguski

Abstract Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk based, where better these two, from Cox et al. (2012), is based on Burrows–Wheeler transform (BWT) and achieves 0.518 bits per base a 134.0 Gbp human genome...

10.1093/bioinformatics/btu844 article EN Bioinformatics 2014-12-22

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial

OPENALEX - Publications

Hegler Tissot Anoop D Shah David Brealey Steve Harris Ruth Agbakoba and 5 more

Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records (EHR) may help, but much information free text rather than a computable form. We applied natural language processing (NLP) EHR data using CogStack platform simulate recruitment into LeoPARDS study,...

10.1109/jbhi.2020.2977925 article EN IEEE Journal of Biomedical and Health Informatics 2020-03-10

Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study

OPENALEX - Publications

Ewan Carr Rebecca Bendayan Daniel Bean Matt Stammers Wenjuan Wang and 34 more

Abstract Background The National Early Warning Score (NEWS2) is currently recommended in the United Kingdom for risk stratification of COVID outcomes, but little known about its ability to detect severe cases. We aimed evaluate NEWS2 outcome and identify validate a set routinely-collected blood physiological parameters taken at hospital admission improve score. Methods Training cohorts comprised 1276 patients admitted King’s College Hospital NHS Foundation Trust with COVID-19 disease from 1...

10.1101/2020.04.24.20078006 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2020-04-29

FaStore: a space-saving solution for raw sequencing data

OPENALEX - Publications

Łukasz Roguski Idoia Ochoa Mikel Hernáez Sebastian Deorowicz

The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences compression permits user choose from several lossy modes improve overall ratio, depending on specific needs.FaStore in lossless mode achieves improvement ratio with respect...

10.1093/bioinformatics/bty205 article EN Bioinformatics 2018-03-27

Challenges and opportunities of using transformer-based multi-task learning in NLP through ML lifecycle: A position paper

OPENALEX - Publications

Lovre Torbarina Tin Ferkovic Łukasz Roguski Velimir Mihelčić Bruno Šarlija and 1 more

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning (ML) systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate...

10.1016/j.nlp.2024.100076 article EN cc-by-nc-nd Natural Language Processing Journal 2024-04-30

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

OPENALEX - Publications

Tao Wang Dominic Oliver Yamiko Joseph Msosa Craig Colling Giulia Spada and 6 more

Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk psychosis at large scale, by leveraging electronic health records (EHRs). This has been externally validated twice is undergoing feasibility testing clinical implementation. Integration this in routine should be facilitated prospective studies, which are required to address...

10.3791/60794 article EN Journal of Visualized Experiments 2020-05-15

Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals

OPENALEX - Publications

Kawsar Noor Łukasz Roguski Xi Bai Álex Handy Roman Klapaukh and 9 more

As more health care organizations transition to using electronic record (EHR) systems, it is important for these maximize the secondary use of their data support service improvement and clinical research. These will find challenging have systems capable harnessing unstructured fields in (clinical notes, letters, etc) practically such interact with all hospital (legacy current).

10.2196/38122 article EN cc-by JMIR Medical Informatics 2022-08-24

MedCAT -- Medical Concept Annotation Tool

OPENALEX - Publications

Željko Kraljević Daniel Bean Aurelie Mascio Łukasz Roguski Amos Folarin and 3 more

Biomedical documents such as Electronic Health Records (EHRs) contain a large amount of information in an unstructured format. The data EHRs is hugely valuable resource documenting clinical narratives and decisions, but whilst the text can be easily understood by human doctors it challenging to use research applications. To uncover potential biomedical we need extract structure they contain. task at hand Named Entity Recognition Linking (NER+L). number entities, ambiguity words, overlapping...

10.48550/arxiv.1912.10166 preprint EN other-oa arXiv (Cornell University) 2019-01-01

CARGO: effective format-free compressed storage of genomic information

OPENALEX - Publications

Łukasz Roguski Paolo Ribeca

The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective sharing. Here we present CARGO (Compressed ARchiving GenOmics), a high-level framework automatically generate software systems optimized arbitrary types large genomic...

10.1093/nar/gkw318 article EN cc-by Nucleic Acids Research 2016-04-29

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

OPENALEX - Publications

Tao Wang Dominic Oliver Yamiko Joseph Msosa Craig Colling Giulia Spada and 6 more

10.3791/60794-v article EN Journal of Visualized Experiments 2020-05-15

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

OPENALEX - Publications

Lovre Torbarina Tin Ferkovic Łukasz Roguski Velimir Mihelčić Bruno Šarlija and 1 more

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems handle these efficiently, from training serving them in production. However, training, deploying, and updating multiple can be complex, costly, time-consuming, mainly when using transformer-based pre-trained models. Multi-Task Learning (MTL) emerged as a promising approach improve efficiency performance through joint rather than separate Motivated...

10.48550/arxiv.2308.08234 preprint EN cc-by arXiv (Cornell University) 2023-01-01

FaStore – a space-saving solution for raw sequencing data

OPENALEX - Publications

Łukasz Roguski Idoia Ochoa Mikel Hernáez Sebastian Deorowicz

Abstract The affordability of DNA sequencing has led to the generation unprecedented volumes raw data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. proposed algorithm does not use any reference sequences compression, permits user choose from several lossy modes improve overall compression ratio, depending on specific needs. We demonstrate through extensive...

10.1101/168096 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2017-07-25

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

OPENALEX - Publications

Hegler Tissot Anoop D Shah Ruth Agbakoba Amos Folarin Luis Romão and 5 more

Abstract Clinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying manual review clinical notes, particularly in critical care settings where the time window short. Automated electronic health records has been explored as way identifying participants, but much information unstructured free text rather than computable form. We developed record pipeline that combines structured data...

10.1101/19005603 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2019-09-09

Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit

OPENALEX - Publications

Željko Kraljević Thomas Searle Anthony Shek Łukasz Roguski Kawsar Noor and 13 more

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application Information Extraction (IE) technologies to enable clinical analysis. We present open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) feature-rich annotation interface customising and training IE models; c) integrations broader CogStack...

10.48550/arxiv.2010.01165 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Disk-based genome sequencing data compression

OPENALEX - Publications

Szymon Grabowski Sebastian Deorowicz Łukasz Roguski

Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since redundancy between overlapping reads be easily captured in (relatively small) main memory. More interesting solutions for this problem are disk-based~(Yanovsky, 2011; Cox et al., 2012), where better these two, from Cox~{\it al.}~(2012), is based on Burrows--Wheeler transform (BWT) and achieves 0.518 bits per...

10.48550/arxiv.1405.6874 preprint EN other-oa arXiv (Cornell University) 2014-01-01

FLUID MOTION MODELLING USING VORTEX PARTICLE METHOD ON GPU

OPENALEX - Publications

Łukasz Roguski Sebastian Deorowicz

In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented sample collision visualization. At end GPU solver performance advantage over CPU presented.

10.21936/si2013_v34.n1.5 article EN Studia Informatica System and information technology 2013-02-09

Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis

OPENALEX - Publications

Lovre Torbarina Velimir Mihelčić Bruno Šarlija Łukasz Roguski Željko Kraljević

Transformer-based models have greatly advanced the progress in field of natural language processing and while they achieve state-of-the-art results on a wide range tasks, are cumbersome parameter size. Subsequently, even when pre-trained transformer used for fine-tuning given task, if dataset is large, it may still not be feasible to fine-tune model within reasonable time. For this reason, we empirically test 8 subsampling methods reducing size text classification task report trade-off...

10.18653/v1/2021.sustainlp-1.11 article EN cc-by 2021-01-01

Coming Soon ...