NFDI4DS | UHH-SEMS - Publication Details

ggtranscript: an R package for the visualization and interpretation of transcript isoforms usingggplot2

OPENALEX - Publications

Emil K. Gustavsson David Zhang Regina H. Reynolds Sonia García-Ruiz Mina Ryten

Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualization and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript, an R package provides a fast flexible method visualize compare As ggplot2 extension, ggtranscript inherits functionality familiarity making it easy use. Availability...

10.1093/bioinformatics/btac409 article EN cc-by Bioinformatics 2022-06-25

Mitochondria function associated genes contribute to Parkinson’s Disease risk and later age at onset

OPENALEX - Publications

Kimberley J. Billingsley Inês A. Barbosa Sara Bandrés‐Ciga John P. Quinn Vivien J. Bubb and 95 more

Abstract Mitochondrial dysfunction has been implicated in the etiology of monogenic Parkinson’s disease (PD). Yet role that mitochondrial processes play most common form disease; sporadic PD, is yet to be fully established. Here, we comprehensively assessed function-associated genes PD by leveraging improvements scale and analysis GWAS data with recent advances our understanding genetics disease. We calculated a mitochondrial-specific polygenic risk score (PRS) showed cumulative small effect...

10.1038/s41531-019-0080-x article EN cc-by npj Parkinson s Disease 2019-05-22

Regulation of mitophagy by the NSL complex underlies genetic risk for Parkinson’s disease at 16q11.2 and MAPT H1 loci

OPENALEX - Publications

Marc P. M. Soutar Daniela Melandri Benjamin O’Callaghan Emily Annuario Amy E. Monaghan and 24 more

Abstract Parkinson’s disease is a common incurable neurodegenerative disease. The identification of genetic variants via genome-wide association studies has considerably advanced our understanding the risk. Understanding functional significance risk loci now critical step towards translating these advances into an enhanced biological Impaired mitophagy key causative pathway in familial disease, but its relevance to idiopathic unclear. We used screening assay evaluate genes identified through...

10.1093/brain/awac325 article EN cc-by Brain 2022-09-08

Incomplete annotation has a disproportionate impact on our understanding of Mendelian and complex neurogenetic disorders

OPENALEX - Publications

David Zhang Sebastian Guelfi Sonia García-Ruiz Beatrice Costa Regina H. Reynolds and 9 more

Growing evidence suggests that human gene annotation remains incomplete; however, it is unclear how this affects different tissues and our understanding of disorders. Here, we detect previously unannotated transcription from Genotype-Tissue Expression RNA sequencing data across 41 tissues. We connect to known genes, confirming incomplete, even among well-studied genes including 63% the Online Mendelian Inheritance in Man-morbid catalog 317 neurodegeneration-associated genes. find greatest...

10.1126/sciadv.aay8299 article EN cc-by-nc Science Advances 2020-06-10

Variation at the TRIM11 locus modifies progressive supranuclear palsy phenotype

OPENALEX - Publications

Edwin Jabbari John Woodside Manuela Tan Maryam Shoai Alan Pittman and 18 more

The basis for clinical variation related to underlying progressive supranuclear palsy (PSP) pathology is unknown. We performed a genome-wide association study (GWAS) identify genetic determinants of PSP phenotype.Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson syndrome (RS) non-RS groups. carried out separate logistic regression GWASs compare RS groups then combined datasets carry whole cohort analysis (RS = 367, 130). validated our...

10.1002/ana.25308 article EN cc-by Annals of Neurology 2018-08-01

Regulatory sites for splicing in human basal ganglia are enriched for disease-relevant information

OPENALEX - Publications

Sebastian Guelfi Karishma D’Sa Juan A. Botía Jana Vandrovcová Regina H. Reynolds and 95 more

Abstract Genome-wide association studies have generated an increasing number of common genetic variants associated with neurological and psychiatric disease risk. An improved understanding the control gene expression in human brain is vital considering this likely modus operandum for many causal variants. However, sampling complexities limit explanatory power brain-related quantitative trait loci (eQTL) allele-specific (ASE) signals. We address this, using paired genomic transcriptomic data...

10.1038/s41467-020-14483-x article EN cc-by Nature Communications 2020-02-25

Cell2Sentence: Teaching Large Language Models the Language of Biology

OPENALEX - Publications

Daniel Lévine Syed Asad Rizvi Sacha Lévy Nazreen Pallikkavaliyaveetil David Zhang and 13 more

Abstract We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models biological context, specifically single-cell transcriptomics. By transforming gene expression data into “cell sentences,” C2S bridges the gap between natural processing and biology. demonstrate cell sentences enable fine-tuning of for diverse tasks in biology, including generation, complex cell-type annotation, direct data-driven text generation. Our experiments reveal that GPT-2, when...

10.1101/2023.09.11.557287 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-09-14

Functional genomics provide key insights to improve the diagnostic yield of hereditary ataxia

OPENALEX - Publications

Zhongbo Chen Arianna Tucci Valentina Cipriani Emil K. Gustavsson Kristina Ibáñez and 72 more

Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...

10.1093/brain/awad009 article EN cc-by Brain 2023-01-10

Megadepth: efficient coverage quantification for BigWigs and BAMs

OPENALEX - Publications

Christopher Wilks Omar Ahmed Daniel Baker David Zhang Leonardo Collado‐Torres and 1 more

A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types.Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. Megadepth all disjoint intervals of Gencode V35 gene annotation more 19 000 GTExV8 files in approximately 1 h 32 threads. available both as command-line an R/Bioconductor...

10.1093/bioinformatics/btab152 article EN cc-by Bioinformatics 2021-03-04

Scaling Large Language Models for Next-Generation Single-Cell Analysis

OPENALEX - Publications

Syed Asad Rizvi Daniel Lévine Aakash Patel Shiyang Zhang Eric Wang and 18 more

ABSTRACT Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current single-cell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as “cell sentences,” train Large Language Models (LLMs) on a corpus comprising over one billion tokens transcriptomic data,...

10.1101/2025.04.14.648850 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-04-17

recount3: summaries and queries for large-scale RNA-seq expression and splicing

OPENALEX - Publications

Christopher Wilks Shijie Zheng Feng Yong Chen Rone Charles Brad Solomon and 10 more

ABSTRACT We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide recount3 snapcount R/Bioconductor packages as well complementary web resources. Using these tools, data can be downloaded study-level summaries or queried for specific exon-exon junctions, genes, samples, other features. used process local and/or private...

10.1101/2021.05.21.445138 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-05-23

Duplication of 10q24 locus: broadening the clinical and radiological spectrum

OPENALEX - Publications

Muriel Holder‐Espinasse Aleksander Jamsheer Fabienne Escande Joris Andrieux Florence Petit and 29 more

10.1038/s41431-018-0326-9 article EN European Journal of Human Genetics 2019-01-08

IntroVerse: a comprehensive database of introns across human tissues

OPENALEX - Publications

Sonia García-Ruiz Emil K. Gustavsson David Zhang Regina H. Reynolds Zhongbo Chen and 5 more

Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence novel introns detected at low frequency across samples within an individual. To enable full spectrum intron use explored, we have developed IntroVerse, which offers extensive catalogue on 332,571 annotated a linked set 4,679,474 junctions covering 32,669 different genes. This dataset been generated through...

10.1093/nar/gkac1056 article EN cc-by Nucleic Acids Research 2022-10-31

Splicing accuracy varies across human introns, tissues and age

OPENALEX - Publications

Sonia García-Ruiz David Zhang Emil K. Gustavsson G Rocamora-Perez Melissa Grant‐Peters and 10 more

Abstract Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples 42 body sites, focusing on split reads partially mapping to known transcripts annotation. We show that occurs at different rates across introns tissues these inaccuracies are primarily affected by the abundance of core components spliceosome assembly its...

10.1101/2023.03.29.534370 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-03-30

The annotation and function of the Parkinson’s and Gaucher disease-linked geneGBA1has been concealed by its protein-coding pseudogeneGBAP1

OPENALEX - Publications

Emil K. Gustavsson Siddharth Sethi Yujing Gao Jonathan Brenton Sonia García-Ruiz and 30 more

ABSTRACT The human genome contains numerous duplicated regions, such as parent-pseudogene pairs, causing sequencing reads to align equally well either gene. extent which this ambiguity complicates transcriptomic analyses is currently unknown. This concerning many parent genes have been linked disease, including GBA1, causally both Parkinson’s and Gaucher disease. We find that most of the short map GBA1 , also its pseudogene, GBAP1 . Using long-read RNA-sequencing in brain, where all mapped...

10.1101/2022.10.21.513169 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-10-21

ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2

OPENALEX - Publications

Emil K. Gustavsson David Zhang Regina H. Reynolds Sonia García-Ruiz Mina Ryten

Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualisation and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript , an R package provides a fast flexible method visualize compare As ggplot2 extension, inherits functionality familiarity making it easy use. Availability implementation is...

10.1101/2022.03.28.486050 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-03-29

Megadepth: efficient coverage quantification for BigWigs and BAMs

OPENALEX - Publications

Christopher Wilks Omar Ahmed Daniel Baker David Zhang Leonardo Collado‐Torres and 1 more

Abstract Motivation A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types. Results Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. all disjoint intervals of Gencode V35 gene annotation more 19,000 GTExV8 files in approximately one hour 32 threads. available both as command-line...

10.1101/2020.12.17.423317 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-12-18

Functional genomics provide key insights to improve the diagnostic yield of hereditary ataxia

OPENALEX - Publications

Zhongbo Chen Arianna Tucci Valentina Cipriani Emil K. Gustavsson Kristina Ibáñez and 15 more

Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...

10.1101/2022.06.24.22276803 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2022-06-27

Variation at the TRIM11 locus modifies Progressive Supranuclear Palsy phenotype

OPENALEX - Publications

Edwin Jabbari Jayne V. Woodside MMX Tan Maryam Shoai Alan Pittman and 18 more

Abstract Objective The basis for clinical variation related to underlying Progressive Supranuclear Palsy (PSP) pathology is unknown. We performed a genome wide association study (GWAS) identify genetic determinants of PSP phenotype. Methods Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson’s syndrome (RS) non-RS groups. carried out separate logistic regression GWAS compare RS groups then combined datasets carry whole cohort analysis...

10.1101/333195 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-05-30

ERASE: Extended Randomization for assessment of annotation enrichment in ASE datasets

OPENALEX - Publications

Karishma D’Sa Regina H. Reynolds Sebastian Guelfi David Zhang Sonia García-Ruiz and 5 more

Abstract Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with various human phenotypes and many these loci are thought to act at a molecular level by regulating gene expression. Detection allele specific expression (ASE), namely preferential usage an transcribed locus, is increasingly important means studying the regulation However, there currently paucity tools available link ASE sites GWAS risk loci. Existing integration methods first use...

10.1101/600411 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-04-05

Vertically-Stacked Single Segment Activation (V-SSA) as a Programming Approach for Directional DBS in Globus Pallidus Internus (GPi) in Parkinson’s Disease (PD) Patients: First Clinical Case Series (5499)

OPENALEX - Publications

Muhammad Anjum Islam Fayad Yasar Torres‐Yaghi Srivatsan Pallavaram Christopher Kalhorn and 3 more

Sunday, April 26April 14, 2020Free AccessVertically-Stacked Single Segment Activation (V-SSA) as a Programming Approach for Directional DBS in Globus Pallidus Internus (GPi) Parkinson’s Disease (PD) Patients: First Clinical Case Series (5499)Muhammad Anjum, Islam Fayad, Yasar Torres-Yaghi, Srivatsan Pallavaram, Christopher Kalhorn, Fahd Amjad, David Zhang, and Fernando PaganAuthors Info & AffiliationsApril 2020 issue94 (15_supplement)https://doi.org/10.1212/WNL.94.15_supplement.5499 Letters...

10.1212/wnl.94.15_supplement.5499 article EN Neurology 2020-04-14

Regulation of mitophagy by the NSL complex underlies genetic risk for Parkinson’s disease: Bioinformatic Prioritisation and Hit Validation v1

OPENALEX - Publications

Karishma D’Sa Sebastian Guelfi David Zhang Alan Pittman Daniah Trabzuni and 5 more

This protocol describes the Bioinformatic Prioritisation of PD GWAS candidates for High Content Screening, and Hit Validation by allele-specific expression (ASE) analysis.

10.17504/protocols.io.3byl4br2zvo5/v1 preprint EN 2022-05-18