David Zhang

ORCID: 0000-0003-2382-8460
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • RNA Research and Splicing
  • RNA modifications and cancer
  • RNA and protein synthesis mechanisms
  • Parkinson's Disease Mechanisms and Treatments
  • Neurological diseases and metabolism
  • Mitochondrial Function and Pathology
  • Molecular Biology Techniques and Applications
  • Genomics and Rare Diseases
  • Cellular transport and secretion
  • Genetic Associations and Epidemiology
  • Cancer-related molecular mechanisms research
  • Autophagy in Disease and Therapy
  • Single-cell and spatial transcriptomics
  • Genomics and Phylogenetic Studies
  • Bioinformatics and Genomic Networks
  • Genetic Neurodegenerative Diseases
  • CRISPR and Genetic Engineering
  • Image Processing and 3D Reconstruction
  • RNA regulation and disease
  • Genomics and Chromatin Dynamics
  • Genomic variations and chromosomal abnormalities
  • Multimodal Machine Learning Applications
  • Genetics and Neurodevelopmental Disorders
  • Cell Image Analysis Techniques
  • Prenatal Screening and Diagnostics

Great Ormond Street Hospital
2020-2023

University College London
2018-2023

Yale University
2023

Research Network (United States)
2022

NIHR Great Ormond Street Hospital Biomedical Research Centre
2022

National Hospital for Neurology and Neurosurgery
2022

Infinity (United States)
2020

MRC Prion Unit
2019

Neurosciences Institute
2018

Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualization and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript, an R package provides a fast flexible method visualize compare As ggplot2 extension, ggtranscript inherits functionality familiarity making it easy use. Availability...

10.1093/bioinformatics/btac409 article EN cc-by Bioinformatics 2022-06-25
Kimberley J. Billingsley Inês A. Barbosa Sara Bandrés‐Ciga John P. Quinn Vivien J. Bubb and 95 more Charu Deshpande Juan A. Botía Regina H. Reynolds David Zhang Michael A. Simpson Cornelis Blauwendraat Ziv Gan‐Or J. Raphael Gibbs Mike A. Nalls Andrew Singleton Alastair Noyce Arianna Tucci Ben Middlehurst Demis A. Kia Mingpu Tan Henry Houlden Huw R. Morris Hélène Plun‐Favreau Peter Holmans John Hardy Daniah Trabzuni José Brás Kin Y. Mok Kerri J. Kinghorn Nicholas Wood Patrick A. Lewis Rita Guerreiro Ruth C. Lovering Lea R’Bibo Mie Rizig Valentina Escott‐Price Viorica Chelban Thomas Foltynie N. Williams Alexis Brice Alexis Brice Suzanne Lesage María Martínez Ayush Giri Claudia Schulte Kathrin Brockmann Javier Simón‐Sánchez Peter Heutink Patrizia Rizzu Manu Sharma Thomas Gasser Aude Nicolas Mark Cookson Faraz Faghri Dena Hernández J. Shulman Laurie Robak Steven Lubbe Steven Finkbeiner Niccolò E. Mencacci Codrin Lungu Sonja W. Scholz Xylena Reed Hampton L. Leonard Guy A. Rouleau Lynne Krohan JJ van Hilten Johan Marinus Astrid Adarmes‐Gómez M. Aguilar Ignacio Álvarez Victoria Álvarez Francisco Javier Barrero J. Bergareche Yarza Inmaculada Bernal‐Bernal Marta Blázquez Estrada Magally Bernal María Teresa Boungiorno Dolores Buiza‐Rueda Ana Cámara María Cárcel F. Carrillo Mario Carrión‐Claro Debora Cerdan Jordi Clarimón Yaroslau Compta Mónica Díez-Fairén Oriol Dols‐Icardo J. Duarte R. l. Duran Francisco Escamilla‐Sevilla Mario Ezquerra Manel Fernández Rubén Fernández‐Santiago C. Garćıa Pedro Ruiz Pilar Gómez‐Garre Mégane Heredia Isabel González Aramburu Ana Gorostidi Pagola

Abstract Mitochondrial dysfunction has been implicated in the etiology of monogenic Parkinson’s disease (PD). Yet role that mitochondrial processes play most common form disease; sporadic PD, is yet to be fully established. Here, we comprehensively assessed function-associated genes PD by leveraging improvements scale and analysis GWAS data with recent advances our understanding genetics disease. We calculated a mitochondrial-specific polygenic risk score (PRS) showed cumulative small effect...

10.1038/s41531-019-0080-x article EN cc-by npj Parkinson s Disease 2019-05-22

Abstract Parkinson’s disease is a common incurable neurodegenerative disease. The identification of genetic variants via genome-wide association studies has considerably advanced our understanding the risk. Understanding functional significance risk loci now critical step towards translating these advances into an enhanced biological Impaired mitophagy key causative pathway in familial disease, but its relevance to idiopathic unclear. We used screening assay evaluate genes identified through...

10.1093/brain/awac325 article EN cc-by Brain 2022-09-08

Growing evidence suggests that human gene annotation remains incomplete; however, it is unclear how this affects different tissues and our understanding of disorders. Here, we detect previously unannotated transcription from Genotype-Tissue Expression RNA sequencing data across 41 tissues. We connect to known genes, confirming incomplete, even among well-studied genes including 63% the Online Mendelian Inheritance in Man-morbid catalog 317 neurodegeneration-associated genes. find greatest...

10.1126/sciadv.aay8299 article EN cc-by-nc Science Advances 2020-06-10

The basis for clinical variation related to underlying progressive supranuclear palsy (PSP) pathology is unknown. We performed a genome-wide association study (GWAS) identify genetic determinants of PSP phenotype.Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson syndrome (RS) non-RS groups. carried out separate logistic regression GWASs compare RS groups then combined datasets carry whole cohort analysis (RS = 367, 130). validated our...

10.1002/ana.25308 article EN cc-by Annals of Neurology 2018-08-01
Sebastian Guelfi Karishma D’Sa Juan A. Botía Jana Vandrovcová Regina H. Reynolds and 95 more David Zhang Daniah Trabzuni Leonardo Collado‐Torres Andrew Thomason Pedro Quijada Leyton Sarah A. Gagliano Taliun Mike A. Nalls Alastair Noyce Aude Nicolas Mark Cookson Sara Bandrés‐Ciga J. Raphael Gibbs Dena Hernández Andrew Singleton Xylena Reed Hampton L. Leonard Cornelis Blauwendraat Faraz Faghri José Brás Rita Guerreiro Arianna Tucci Demis A. Kia Henry Houlden Hélène Plun‐Favreau Kin Y. Mok Nicholas Wood Ruth C. Lovering Lea R’Bibo Mie Rizig Viorica Chelban Manuela Tan Huw R. Morris Ben Middlehurst John P. Quinn Kimberley Billingsley Peter Holmans Kerri J. Kinghorn Patrick A. Lewis Valentina Escott‐Price Nigel Williams Thomas Foltynie Alexis Brice Alexis Brice Suzanne Lesage Jean‐Christophe Corvol María Martínez Anamika Giri Claudia Schulte Kathrin Brockmann Javier Simón‐Sánchez Peter Heutink Thomas Gasser Patrizia Rizzu Manu Sharma Joshua Shulman Laurie Robak Steven Lubbe Niccolò E. Mencacci Steven Finkbeiner Codrin Lungu Sonja W. Scholz Ziv Gan‐Or Guy A. Rouleau Lynne Krohan Jacobus J. van Hilten Johan Marinus Astrid Adarmes‐Gómez Inmaculada Bernal‐Bernal Marta Bonilla‐Toribio Dolores Buiza‐Rueda Fátima Carrillo Mario Carrión‐Claro Pablo Mir Pilar Gómez‐Garre Silvia Jesús Miguel A. Labrador‐Espinosa Daniel Macías Laura Vargas‐González Carlota Méndez‐del‐Barrio María Teresa Periñán Cristina Tejera‐Parrado Mónica Díez-Fairén Miquel Aguilar Ignacio Álvarez María Teresa Boungiorno María Cárcel Pau Pástor Juan Pablo Tartari Victoria Álvarez Manuel Menéndez‐González Marta Blázquez Estrada Ciara García Esther Suárez-Sanmartín Francisco Javier Barrero Elisabet Mondragón Rezola

Abstract Genome-wide association studies have generated an increasing number of common genetic variants associated with neurological and psychiatric disease risk. An improved understanding the control gene expression in human brain is vital considering this likely modus operandum for many causal variants. However, sampling complexities limit explanatory power brain-related quantitative trait loci (eQTL) allele-specific (ASE) signals. We address this, using paired genomic transcriptomic data...

10.1038/s41467-020-14483-x article EN cc-by Nature Communications 2020-02-25

Abstract We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models biological context, specifically single-cell transcriptomics. By transforming gene expression data into “cell sentences,” C2S bridges the gap between natural processing and biology. demonstrate cell sentences enable fine-tuning of for diverse tasks in biology, including generation, complex cell-type annotation, direct data-driven text generation. Our experiments reveal that GPT-2, when...

10.1101/2023.09.11.557287 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-09-14

Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...

10.1093/brain/awad009 article EN cc-by Brain 2023-01-10

A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types.Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. Megadepth all disjoint intervals of Gencode V35 gene annotation more 19 000 GTExV8 files in approximately 1 h 32 threads. available both as command-line an R/Bioconductor...

10.1093/bioinformatics/btab152 article EN cc-by Bioinformatics 2021-03-04

ABSTRACT Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current single-cell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as “cell sentences,” train Large Language Models (LLMs) on a corpus comprising over one billion tokens transcriptomic data,...

10.1101/2025.04.14.648850 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-04-17

ABSTRACT We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide recount3 snapcount R/Bioconductor packages as well complementary web resources. Using these tools, data can be downloaded study-level summaries or queried for specific exon-exon junctions, genes, samples, other features. used process local and/or private...

10.1101/2021.05.21.445138 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-05-23

Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence novel introns detected at low frequency across samples within an individual. To enable full spectrum intron use explored, we have developed IntroVerse, which offers extensive catalogue on 332,571 annotated a linked set 4,679,474 junctions covering 32,669 different genes. This dataset been generated through...

10.1093/nar/gkac1056 article EN cc-by Nucleic Acids Research 2022-10-31

Abstract Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples 42 body sites, focusing on split reads partially mapping to known transcripts annotation. We show that occurs at different rates across introns tissues these inaccuracies are primarily affected by the abundance of core components spliceosome assembly its...

10.1101/2023.03.29.534370 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-03-30

ABSTRACT The human genome contains numerous duplicated regions, such as parent-pseudogene pairs, causing sequencing reads to align equally well either gene. extent which this ambiguity complicates transcriptomic analyses is currently unknown. This concerning many parent genes have been linked disease, including GBA1, causally both Parkinson’s and Gaucher disease. We find that most of the short map GBA1 , also its pseudogene, GBAP1 . Using long-read RNA-sequencing in brain, where all mapped...

10.1101/2022.10.21.513169 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-10-21

Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualisation and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript , an R package provides a fast flexible method visualize compare As ggplot2 extension, inherits functionality familiarity making it easy use. Availability implementation is...

10.1101/2022.03.28.486050 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-03-29

Abstract Motivation A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types. Results Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. all disjoint intervals of Gencode V35 gene annotation more 19,000 GTExV8 files in approximately one hour 32 threads. available both as command-line...

10.1101/2020.12.17.423317 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-12-18

Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...

10.1101/2022.06.24.22276803 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2022-06-27

Abstract Objective The basis for clinical variation related to underlying Progressive Supranuclear Palsy (PSP) pathology is unknown. We performed a genome wide association study (GWAS) identify genetic determinants of PSP phenotype. Methods Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson’s syndrome (RS) non-RS groups. carried out separate logistic regression GWAS compare RS groups then combined datasets carry whole cohort analysis...

10.1101/333195 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-05-30

Abstract Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with various human phenotypes and many these loci are thought to act at a molecular level by regulating gene expression. Detection allele specific expression (ASE), namely preferential usage an transcribed locus, is increasingly important means studying the regulation However, there currently paucity tools available link ASE sites GWAS risk loci. Existing integration methods first use...

10.1101/600411 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-04-05

Sunday, April 26April 14, 2020Free AccessVertically-Stacked Single Segment Activation (V-SSA) as a Programming Approach for Directional DBS in Globus Pallidus Internus (GPi) Parkinson’s Disease (PD) Patients: First Clinical Case Series (5499)Muhammad Anjum, Islam Fayad, Yasar Torres-Yaghi, Srivatsan Pallavaram, Christopher Kalhorn, Fahd Amjad, David Zhang, and Fernando PaganAuthors Info & AffiliationsApril 2020 issue94 (15_supplement)https://doi.org/10.1212/WNL.94.15_supplement.5499 Letters...

10.1212/wnl.94.15_supplement.5499 article EN Neurology 2020-04-14

This protocol describes the Bioinformatic Prioritisation of PD GWAS candidates for High Content Screening, and Hit Validation by allele-specific expression (ASE) analysis.

10.17504/protocols.io.3byl4br2zvo5/v1 preprint EN 2022-05-18
Coming Soon ...