Mark Chaisson
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Chromosomal and Genetic Variations
- Genomics and Rare Diseases
- Genomic variations and chromosomal abnormalities
- RNA modifications and cancer
- Genetic Associations and Epidemiology
- Genetic Mapping and Diversity in Plants and Animals
- Genetic Neurodegenerative Diseases
- Algorithms and Data Compression
- CRISPR and Genetic Engineering
- Genomics and Chromatin Dynamics
- Bioinformatics and Genomic Networks
- Advanced biosensing and bioanalysis techniques
- Machine Learning in Bioinformatics
- Molecular Biology Techniques and Applications
- RNA Research and Splicing
- Cancer Genomics and Diagnostics
- Evolution and Genetic Dynamics
- Planetary Science and Exploration
- Genetic factors in colorectal cancer
- Gene expression and cancer classification
- Marine animal studies overview
- Genetics, Bioinformatics, and Biomedical Research
- Genome Rearrangement Algorithms
University of Southern California
2017-2025
USC Norris Comprehensive Cancer Center
2023-2024
Southern California University for Professional Studies
2021-2023
University of Washington
2014-2021
LAC+USC Medical Center
2021
Seattle University
2014
Pacific Biosciences (United States)
2011-2012
Cold Spring Harbor Laboratory
2012
University of California, San Diego
2001-2008
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because the non-contiguous transcript structure, relatively short read lengths constantly increasing throughput sequencing technologies. Currently available aligners suffer from high mapping error rates, low speed, length limitation biases. Results: To align our large (>80 billon reads) ENCODE Transcriptome dataset, we developed Spliced Transcripts Alignment to Reference (STAR) software...
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set eight structural variant classes comprising both balanced unbalanced variants, which constructed using short-read DNA sequencing data statistically phased onto haplotype blocks 26 populations. Analysing this set, identify gene-intersecting exhibiting population stratification naturally occurring homozygous gene knockouts that suggest...
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such available only a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling highly accurate nearly genomes. Here we present lessons learned from generating 16 that represent six major vertebrate...
Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation may produce reads up several hundred bases long, SMS produces tens kilobases long. Existing alignment are either too inefficient for datasets, or not sensitive enough align reads, which a higher error rate than sequencing.We describe the method BLASR (Basic Local Alignment with Successive Refinement) mapping (SMS) that thousands divergence between read...
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies human genetic diversity and disease association. Here, we apply a suite long-read, short-read, strand-specific technologies, optical mapping, variant discovery algorithms to comprehensively analyze three trios define the full spectrum variation in haplotype-resolved manner. We identify 818,054 indel (<50 bp) 27,622 SVs (≥50 per genome. also discover 156 inversions genome 58 intersect...
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% genome: 26 million base pairs) integrate all forms genetic variation, even across complex loci. identified 107,590 structural variants (SVs), which 68% were not...
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they yet reads comparable in length Sanger-based sequencing. Current fragment assembly algorithms been implemented and optimized for mate-paired thus do not perform well on short produced by read technologies. We present a new Eulerian assembler that generates nearly optimal assemblies of bacterial genomes describe an approach...
Improving on the gorilla genome Access to complete, high-quality genomes of nonhuman primates will also help us understand human biology. Gordon et al. used long-read sequencing technology improve data our close relative gorilla. Sequencing from a single individual decreased assembly fragmentation and recovered previously missed genes noncoding loci. Mapping short-read sequences additional gorillas helped reconstruct “pan” sequence documenting genetic variation. Comparison with revealed...
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid genomes. By using assembly-based approach (SMRT-SV), systematically assessed each genome independently for structural variants (SVs) and indels resolving sequence structure 461,553 2 bp 28 kbp in length. We find that >89% these have been missed as part analysis 1000 Genomes Project even after adjusting common (MAF > 1%)....
A spotlight on great ape genomes Most nonhuman primate generated to date have been “humanized” owing their many gaps and the reliance guidance by reference human genome. To remove this humanizing effect, Kronenberg et al. assembled long-read of a chimpanzee, an orangutan, two humans compared them with previously gorilla This analysis recognized genomic structural variation specific particular lineages. Comparisons between chimpanzee cerebral organoids showed down-regulation expression genes...
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize strengths of alternative de Bruijn graph to genome assembly. Moreover, these studies often assume that applications are limited short accurate OLC is only practical paradigm for reads. We show how generalize graphs describe ABruijn assembler, which combines approaches results reconstructions.
Obtaining high-quality sequence continuity of complex regions recent segmental duplication remains one the major challenges finishing genome assemblies. In human and mouse genomes, this was achieved by targeting large-insert clones using costly laborious capillary-based sequencing approaches. Sanger shotgun clone inserts, however, has now been largely abandoned, leaving most these unresolved in newer assemblies generated primarily next-generation hybrid Here we show that it is possible to...
Abstract Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing 1,2 with continuous long-read or high-fidelity 3 data. Employing this strategy, produced completely phased each haplotype an individual Puerto Rican descent (HG00733) in absence The assemblies accurate...
Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome...
Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats extended segmental duplications 1,2 . Although resolution these regions in first complete assembly a genome—the Telomere-to-Telomere Consortium’s CHM13 (T2T-CHM13)—provided model their homology 3 , it remained unclear whether patterns were ancestral or maintained by ongoing recombination exchange. Here we show that contain...
Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. contains 47 phased, diploid assemblies from cohort of genetically diverse individuals. These cover more than 99% the expected sequence and are accurate at structural base-pair levels. Based on alignments assemblies, we generated that captures known variants haplotypes, reveals novel alleles structurally complex loci, adds 119 million base pairs euchromatic polymorphic 1,529 gene...
Abstract Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared pattern SNVs between unique duplicated regions 3,4 We find that are elevated 60% to estimate at least 23% this increase is due interlocus gene conversion (IGC) with up 4.3 megabase pairs SD sequence...