Justin Chu
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Chromosomal and Genetic Variations
- Algorithms and Data Compression
- Bacteriophages and microbial interactions
- RNA modifications and cancer
- Genetic diversity and population structure
- Molecular Biology Techniques and Applications
- Advanced biosensing and bioanalysis techniques
- Gene expression and cancer classification
- Genomics and Chromatin Dynamics
- Identification and Quantification in Food
- Data Mining Algorithms and Applications
- Genomic variations and chromosomal abnormalities
- Retinal Diseases and Treatments
- Genetics, Bioinformatics, and Biomedical Research
- Nanopore and Nanochannel Transport Studies
- Retinal Development and Disorders
- Plant nutrient uptake and metabolism
- Genetic factors in colorectal cancer
- Genomics and Rare Diseases
- Epigenetics and DNA Methylation
- Energy Efficient Wireless Sensor Networks
- Web Data Mining and Analysis
- Autophagy in Disease and Therapy
Dana-Farber Cancer Institute
2022-2024
Harvard University
2023-2024
Lemuel Shattuck Hospital
2023
Canada's Michael Smith Genome Sciences Centre
2014-2022
University of British Columbia
2014-2022
BC Cancer Agency
2010-2018
Genome British Columbia
2018
Johns Hopkins University
2016
University of California, Irvine
2013-2015
University of California, San Diego
2015
Gastric cancer is a leading cause of deaths, but analysis its molecular and clinical characteristics has been complicated by histological aetiological heterogeneity. Here we describe comprehensive evaluation 295 primary gastric adenocarcinomas as part The Cancer Genome Atlas (TCGA) project. We propose classification dividing into four subtypes: tumours positive for Epstein–Barr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, amplification JAK2, CD274 (also...
The assembly of DNA sequences de novo is fundamental to genomics research. It the first many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis genomic variation between species, or within individuals critically depend on robustly assembled sequences. In span a single decade, sequence throughput leading sequencing instruments has increased drastically, coupled with established planned large-scale, personalized medicine initiatives genomes...
Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic...
The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths single-molecule - valuable features for detailed genome characterization. To realize the potential of this platform, a number groups are developing bioinformatics tools tuned unique characteristics its data. We note that these development efforts would benefit simulator software, output which could be used to benchmark analysis tools.
Abstract Motivation Sequencing of human genomes is now routine, and assembly shotgun reads increasingly feasible. However, assemblies often fail to inform about chromosome-scale structure due a lack linkage information over long stretches DNA—a shortcoming that being addressed by new sequencing protocols, such as the GemCode Chromium linked from 10 × Genomics. Results Here, we present ARCS, an application utilizes barcoding contained in further organize draft into highly contiguous...
The most common cause of the neurodegenerative diseases amyotrophic lateral sclerosis and frontotemporal dementia is a hexanucleotide repeat expansion in C9orf72. Here we report study C9orf72 protein by examining consequences loss functions. Deletion one or both alleles gene mice causes age-dependent lethality phenotypes. We demonstrate that regulates nutrient sensing as decreases phosphorylation mTOR substrate S6K1. transcription factor EB (TFEB), master regulator lysosomal autophagy genes,...
Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats extended segmental duplications 1,2 . Although resolution these regions in first complete assembly a genome—the Telomere-to-Telomere Consortium’s CHM13 (T2T-CHM13)—provided model their homology 3 , it remained unclear whether patterns were ancestral or maintained by ongoing recombination exchange. Here we show that contain...
Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. contains 47 phased, diploid assemblies from cohort of genetically diverse individuals. These cover more than 99% the expected sequence and are accurate at structural base-pair levels. Based on alignments assemblies, we generated that captures known variants haplotypes, reveals novel alleles structurally complex loci, adds 119 million base pairs euchromatic polymorphic 1,529 gene...
Abstract Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared pattern SNVs between unique duplicated regions 3,4 We find that are elevated 60% to estimate at least 23% this increase is due interlocus gene conversion (IGC) with up 4.3 megabase pairs SD sequence...
Genome sequencing yields the sequence of many short snippets DNA (reads) from a genome. assembly attempts to reconstruct original genome which these reads were derived. This task is difficult due gaps and errors in data, repetitive underlying genome, heterozygosity. As result, are common. In absence reference misassemblies may be identified by comparing data looking for discrepancies between two. Once identified, corrected, improving quality assembled sequence. Although tools exist identify...
Abstract Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by data structure that supports time- memory-efficient set membership queries. Bloom filters offer such queries but require false positives controlled. We present BioBloom Tools, filter-based sequence-screening tool is faster than BWA, Bowtie 2 (popular alignment algorithms) FACS (a query algorithm). It delivers accuracies comparable these tools, controls has...
The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate contiguous draft assemblies. We introduce ARKS, an alignment-free read scaffolding methodology that uses reads to organize assemblies further into drafts. Our approach departs other alignment-dependent scaffolders, including our own (ARCS), a kmer-based mapping approach. kmer strategy has several advantages over alignment...
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They enriched centromeres challenging to assemble. Existing algorithms for identifying satellite either require the complete assembly of satellites or only work simple repeat structures without HORs. Here we describe Repeat Finder (SRF), new algorithm reconstructing units HORs from accurate reads assemblies prior knowledge on structures. Applying SRF real sequence data, showed...
Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome transcriptome assembly, k-mer counting error correction. Hence, expediting hashing operations would have a substantial impact the field, making applications faster more efficient.We present ntHash, algorithm tuned processing DNA/RNA sequences. It performs best when calculating hash values adjacent k-mers an input sequence, operating order...
The genome sequences of the plastid and mitochondrion white spruce ( Picea glauca ) were assembled from whole-genome shotgun sequencing data using ABySS. contained reads both nuclear organellar genomes, genomes abundant in as each cell harbors hundreds mitochondria plastids. Hence, assembly 123-kb 5.9-Mb mitochondrial accomplished by analyzing sets primarily representing low coverage genome. annotated for their coding genes, ribosomal RNA, transfer RNA. Transcript abundances genes quantified...
Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within last decade, transcriptome analysis workflows have primarily used gene expression data while isoform sequence at level still remains fairly limited. Detection and discovery of isoforms single cells is difficult because inherent technical shortcomings scRNA-seq data, existing assembly methods are mainly designed for bulk samples. To address this challenge, we developed RNA-Bloom, an algorithm that...
Purpose To define the molecular basis of retinal degeneration in consanguineous Pakistani pedigrees with early onset degeneration. Methods A cohort 277 individuals representing 26 from Punjab province Pakistan was analyzed. Exomes were captured commercial kits and sequenced on an Illumina HiSeq 2500. Candidate variants identified using standard tools analyzed exomeSuite to detect all potentially pathogenic changes genes implicated Segregation analysis performed by dideoxy sequencing novel...
The grizzly bear (Ursus arctos ssp. horribilis) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from nanopore-based long read platform. final assembly 2.33 Gb scaffold N50 36.7 Mb, is comparable size to that its close relative polar (2.30 Gb). An analysis 4104 highly conserved mammalian genes indicated 96.1% found be complete within...
The ability to generate high-quality genome sequences is cornerstone modern biological research. Even with recent advancements in sequencing technologies, many assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between draft assembly and reference sequence(s) contiguate correct the former respect latter. Instead of alignments, ntJoin uses lightweight mapping approach based on graph data structure generated from ordered...
Purpose: Brimonidine is a selective alpha-2 adrenergic agonist used to reduce intraocular pressure and it has been shown have some neuroprotective effects. Hydroquinone (HQ) toxicant present in cigarette smoke, other sources. In this study, we investigated the cyto-protective effects vitro of on human retinal pigment epithelium cells (ARPE-19) Müller (MIO-M1) that had treated with HQ. Methods: Cells were pretreated for 6 h different doses tartrate 0.1% (1/2×, 1×, 5×, 10×), followed by 24-h...
Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When length longer than combined read length, there remains gap unsequenced between pairs. If target in such experiments sequenced at level to provide redundant coverage, it may be possible bridge these gaps using bioinformatics methods. Konnector local de novo assembly tool that addresses this problem. Here we report on version 2.0 our tool. uses probabilistic and memory-efficient data...