David Zhang
- RNA Research and Splicing
- RNA modifications and cancer
- RNA and protein synthesis mechanisms
- Parkinson's Disease Mechanisms and Treatments
- Neurological diseases and metabolism
- Mitochondrial Function and Pathology
- Molecular Biology Techniques and Applications
- Genomics and Rare Diseases
- Cellular transport and secretion
- Genetic Associations and Epidemiology
- Cancer-related molecular mechanisms research
- Autophagy in Disease and Therapy
- Single-cell and spatial transcriptomics
- Genomics and Phylogenetic Studies
- Bioinformatics and Genomic Networks
- Genetic Neurodegenerative Diseases
- CRISPR and Genetic Engineering
- Image Processing and 3D Reconstruction
- RNA regulation and disease
- Genomics and Chromatin Dynamics
- Genomic variations and chromosomal abnormalities
- Multimodal Machine Learning Applications
- Genetics and Neurodevelopmental Disorders
- Cell Image Analysis Techniques
- Prenatal Screening and Diagnostics
Great Ormond Street Hospital
2020-2023
University College London
2018-2023
Yale University
2023
Research Network (United States)
2022
NIHR Great Ormond Street Hospital Biomedical Research Centre
2022
National Hospital for Neurology and Neurosurgery
2022
Infinity (United States)
2020
MRC Prion Unit
2019
Neurosciences Institute
2018
Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualization and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript, an R package provides a fast flexible method visualize compare As ggplot2 extension, ggtranscript inherits functionality familiarity making it easy use. Availability...
Abstract Mitochondrial dysfunction has been implicated in the etiology of monogenic Parkinson’s disease (PD). Yet role that mitochondrial processes play most common form disease; sporadic PD, is yet to be fully established. Here, we comprehensively assessed function-associated genes PD by leveraging improvements scale and analysis GWAS data with recent advances our understanding genetics disease. We calculated a mitochondrial-specific polygenic risk score (PRS) showed cumulative small effect...
Abstract Parkinson’s disease is a common incurable neurodegenerative disease. The identification of genetic variants via genome-wide association studies has considerably advanced our understanding the risk. Understanding functional significance risk loci now critical step towards translating these advances into an enhanced biological Impaired mitophagy key causative pathway in familial disease, but its relevance to idiopathic unclear. We used screening assay evaluate genes identified through...
Growing evidence suggests that human gene annotation remains incomplete; however, it is unclear how this affects different tissues and our understanding of disorders. Here, we detect previously unannotated transcription from Genotype-Tissue Expression RNA sequencing data across 41 tissues. We connect to known genes, confirming incomplete, even among well-studied genes including 63% the Online Mendelian Inheritance in Man-morbid catalog 317 neurodegeneration-associated genes. find greatest...
The basis for clinical variation related to underlying progressive supranuclear palsy (PSP) pathology is unknown. We performed a genome-wide association study (GWAS) identify genetic determinants of PSP phenotype.Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson syndrome (RS) non-RS groups. carried out separate logistic regression GWASs compare RS groups then combined datasets carry whole cohort analysis (RS = 367, 130). validated our...
Abstract Genome-wide association studies have generated an increasing number of common genetic variants associated with neurological and psychiatric disease risk. An improved understanding the control gene expression in human brain is vital considering this likely modus operandum for many causal variants. However, sampling complexities limit explanatory power brain-related quantitative trait loci (eQTL) allele-specific (ASE) signals. We address this, using paired genomic transcriptomic data...
Abstract We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models biological context, specifically single-cell transcriptomics. By transforming gene expression data into “cell sentences,” C2S bridges the gap between natural processing and biology. demonstrate cell sentences enable fine-tuning of for diverse tasks in biology, including generation, complex cell-type annotation, direct data-driven text generation. Our experiments reveal that GPT-2, when...
Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...
A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types.Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. Megadepth all disjoint intervals of Gencode V35 gene annotation more 19 000 GTExV8 files in approximately 1 h 32 threads. available both as command-line an R/Bioconductor...
ABSTRACT Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current single-cell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as “cell sentences,” train Large Language Models (LLMs) on a corpus comprising over one billion tokens transcriptomic data,...
ABSTRACT We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide recount3 snapcount R/Bioconductor packages as well complementary web resources. Using these tools, data can be downloaded study-level summaries or queried for specific exon-exon junctions, genes, samples, other features. used process local and/or private...
Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence novel introns detected at low frequency across samples within an individual. To enable full spectrum intron use explored, we have developed IntroVerse, which offers extensive catalogue on 332,571 annotated a linked set 4,679,474 junctions covering 32,669 different genes. This dataset been generated through...
Abstract Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples 42 body sites, focusing on split reads partially mapping to known transcripts annotation. We show that occurs at different rates across introns tissues these inaccuracies are primarily affected by the abundance of core components spliceosome assembly its...
ABSTRACT The human genome contains numerous duplicated regions, such as parent-pseudogene pairs, causing sequencing reads to align equally well either gene. extent which this ambiguity complicates transcriptomic analyses is currently unknown. This concerning many parent genes have been linked disease, including GBA1, causally both Parkinson’s and Gaucher disease. We find that most of the short map GBA1 , also its pseudogene, GBAP1 . Using long-read RNA-sequencing in brain, where all mapped...
Abstract Motivation The advent of long-read sequencing technologies has increased demand for the visualisation and interpretation transcripts. However, tools that perform such visualizations remain inflexible lack ability to easily identify differences between transcript structures. Here, we introduce ggtranscript , an R package provides a fast flexible method visualize compare As ggplot2 extension, inherits functionality familiarity making it easy use. Availability implementation is...
Abstract Motivation A common way to summarize sequencing datasets is quantify data lying within genes or other genomic intervals. This can be slow and require different tools for input file types. Results Megadepth a fast tool quantifying alignments coverage BigWig BAM/CRAM files, using substantially less memory than the next-fastest competitor. all disjoint intervals of Gencode V35 gene annotation more 19,000 GTExV8 files in approximately one hour 32 threads. available both as command-line...
Abstract Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified hereditary ataxia, heterogeneous group disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants more than 300 genes been described, leading detailed genetic classification partitioned age-of-onset. Despite these advances, up 75% patients with ataxia remain molecularly undiagnosed even following whole genome...
Abstract Objective The basis for clinical variation related to underlying Progressive Supranuclear Palsy (PSP) pathology is unknown. We performed a genome wide association study (GWAS) identify genetic determinants of PSP phenotype. Methods Two independent pathological and clinically diagnosed cohorts were genotyped phenotyped create Richardson’s syndrome (RS) non-RS groups. carried out separate logistic regression GWAS compare RS groups then combined datasets carry whole cohort analysis...
Abstract Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with various human phenotypes and many these loci are thought to act at a molecular level by regulating gene expression. Detection allele specific expression (ASE), namely preferential usage an transcribed locus, is increasingly important means studying the regulation However, there currently paucity tools available link ASE sites GWAS risk loci. Existing integration methods first use...
Sunday, April 26April 14, 2020Free AccessVertically-Stacked Single Segment Activation (V-SSA) as a Programming Approach for Directional DBS in Globus Pallidus Internus (GPi) Parkinson’s Disease (PD) Patients: First Clinical Case Series (5499)Muhammad Anjum, Islam Fayad, Yasar Torres-Yaghi, Srivatsan Pallavaram, Christopher Kalhorn, Fahd Amjad, David Zhang, and Fernando PaganAuthors Info & AffiliationsApril 2020 issue94 (15_supplement)https://doi.org/10.1212/WNL.94.15_supplement.5499 Letters...
This protocol describes the Bioinformatic Prioritisation of PD GWAS candidates for High Content Screening, and Hit Validation by allele-specific expression (ASE) analysis.