Tobias Marschall

ORCID: 0000-0002-9376-1030
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Chromosomal and Genetic Variations
  • Genomics and Rare Diseases
  • Genomic variations and chromosomal abnormalities
  • RNA and protein synthesis mechanisms
  • Algorithms and Data Compression
  • Gene expression and cancer classification
  • Cancer Genomics and Diagnostics
  • DNA and Biological Computing
  • Machine Learning in Bioinformatics
  • Single-cell and spatial transcriptomics
  • Genetic Mapping and Diversity in Plants and Animals
  • Genetic Associations and Epidemiology
  • Genomics and Chromatin Dynamics
  • Molecular Biology Techniques and Applications
  • CRISPR and Genetic Engineering
  • semigroups and automata theory
  • Genetics, Bioinformatics, and Biomedical Research
  • Genetic diversity and population structure
  • Bioinformatics and Genomic Networks
  • Genome Rearrangement Algorithms
  • Evolution and Genetic Dynamics
  • Cell Image Analysis Techniques
  • RNA Research and Splicing
  • Epigenetics and DNA Methylation

Heinrich Heine University Düsseldorf
2020-2025

Düsseldorf University Hospital
2023-2025

Max Planck Institute for Informatics
2015-2021

Saarland University
2015-2020

Centrum Wiskunde & Informatica
2010-2019

Institute of Bioinformatics
2019

Helsinki Institute for Information Technology
2015

Bielefeld University
2006-2015

Max Planck Society
2013-2015

Brown University
2013

Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 95 more Alla Mikheenko Mitchell R. Vollger Nicolas Altemose Lev Uralsky Ariel Gershman Sergey Aganezov Savannah J. Hoyt Mark Diekhans Glennis A. Logsdon Michael Alonge Stylianos E. Antonarakis Matthew Borchers Gerard G. Bouffard Shelise Brooks Gina V. Caldas Nae-Chyun Chen Haoyu Cheng Chen-Shan Chin William Chow Leonardo Gomes de Lima Philip C. Dishuck Richard Durbin Tatiana Dvorkina Ian T. Fiddes Giulio Formenti Robert S. Fulton Arkarachai Fungtammasan Erik Garrison Patrick G. S. Grady Tina A. Graves-Lindsay Ira M. Hall Nancy F. Hansen Gabrielle A. Hartley Marina Haukness Kerstin Howe Michael W. Hunkapiller Chirag Jain Miten Jain Erich D. Jarvis Peter Kerpedjiev Melanie Kirsche Mikhail Kolmogorov Jonas Korlach Milinn Kremitzki Heng Li Valerie V. Maduro Tobias Marschall Ann M. Mc Cartney Jennifer McDaniel Danny E. Miller James C. Mullikin Eugene W. Myers Nathan D. Olson Benedict Paten Paul Peluso Pavel A. Pevzner David Porubský Tamara Potapova Е. И. Рогаев Jeffrey Rosenfeld Steven L. Salzberg Valérie Schneider Fritz J. Sedlazeck Kishwar Shafin Colin J. Shew Alaina Shumate Ying Sims Arian F. A. Smit Daniela C. Soto Ivan Sović Jessica M. Storer Aaron Streets Beth A. Sullivan Françoise Thibaud‐Nissen James Torrance Justin Wagner Brian P. Walenz Aaron M. Wenger Jonathan Wood Chunlin Xiao Stephanie M. Yan Alice Young Samantha Zarate Urvashi Surti Rajiv C. McCoy Megan Y. Dennis Ivan A. Alexandrov Jennifer L. Gerton Rachel J. O’Neill Winston Timp Justin M. Zook Michael C. Schatz Evan E. Eichler Karen H. Miga Adam M. Phillippy

Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...

10.1126/science.abj6987 article EN Science 2022-03-31

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies human genetic diversity and disease association. Here, we apply a suite long-read, short-read, strand-specific technologies, optical mapping, variant discovery algorithms to comprehensively analyze three trios define the full spectrum variation in haplotype-resolved manner. We identify 818,054 indel (<50 bp) 27,622 SVs (≥50 per genome. also discover 156 inversions genome 58 intersect...

10.1038/s41467-018-08148-z article EN cc-by Nature Communications 2019-04-16

10.1038/ng.3021 article EN Nature Genetics 2014-06-29

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. final, phase 3 release 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS resource, which now includes 602 complete trios, sequenced to depth 30X using Illumina. We performed single-nucleotide variant (SNV) short...

10.1016/j.cell.2022.08.004 article EN cc-by Cell 2022-09-01

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% genome: 26 million base pairs) integrate all forms genetic variation, even across complex loci. identified 107,590 structural variants (SVs), which 68% were not...

10.1126/science.abf7117 article EN Science 2021-02-25

Abstract De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks wall-clock time. To enable rapid assembly, we present Shasta, de assembler, polishing algorithms named MarginPolish HELEN. Using single PromethION sequencer our toolkit, assembled 11 highly contiguous genomes in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values 6.5× coverage reads &gt;100 kb three flow cells per sample. Shasta produced...

10.1038/s41587-020-0503-6 article EN cc-by Nature Biotechnology 2020-05-04

The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of genome. resulting haplotypes, lists SNPs belonging each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, oblivious direct read information, constitute state-of-the-art. Haplotype assembly, addresses phasing directly from sequencing reads, suffers fact that reads current generation too short serve purposes...

10.1089/cmb.2014.0157 article EN Journal of Computational Biology 2015-02-06

Abstract Read-based phasing allows to reconstruct the haplotypes of a sample purely from sequencing reads. While is an important step for answering questions about population genetics, compound heterozygosity, and aid in clinical decision making, there has been lack accurate, usable standards-based software. WhatsHap production-ready tool highly accurate read-based phasing. It was designed beginning leverage third-generation technologies, whose long reads can span many variants are therefore...

10.1101/085050 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2016-11-02

Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse genomes two patients with congenital abnormalities using MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that long reads are superior to short regard de novo chromothripsis rearrangements. The also enable efficient phasing genetic variations, which leveraged determine...

10.1038/s41467-017-01343-4 article EN cc-by Nature Communications 2017-10-31

Abstract Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing 1,2 with continuous long-read or high-fidelity 3 data. Employing this strategy, produced completely phased each haplotype an individual Puerto Rican descent (HG00733) in absence The assemblies accurate...

10.1038/s41587-020-0719-5 article EN cc-by Nature Biotechnology 2020-12-07

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for either do not generate chromosome-scale phasing require pedigree information, which limits application. We present method named diploid (DipAsm) that uses long, accurate reads long-range conformation data single individuals to within 1 day. Applied four public human genomes, PGP1, HG002, NA12878 HG00733, DipAsm produced haplotype-resolved...

10.1038/s41587-020-0711-0 article EN cc-by Nature Biotechnology 2020-12-07

Abstract Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In present study, we propose new algorithm, PanGenie, that leverages haplotype-resolved pangenome together -mer counts from sequencing...

10.1038/s41588-022-01043-w article EN cc-by Nature Genetics 2022-04-01
Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 94 more Alla Mikheenko Mitchell R. Vollger Nicolas Altemose Lev Uralsky Ariel Gershman Sergey Aganezov Savannah J. Hoyt Mark Diekhans Glennis A. Logsdon Michael Alonge Stylianos E. Antonarakis Matthew Borchers Gerard G. Bouffard Shelise Brooks Gina V. Caldas Haoyu Cheng Chen-Shan Chin William Chow Leonardo Gomes de Lima Philip C. Dishuck Richard Durbin Tatiana Dvorkina Ian T. Fiddes Giulio Formenti Robert S. Fulton Arkarachai Fungtammasan Erik Garrison Patrick G. S. Grady Tina A. Graves-Lindsay Ira M. Hall Nancy F. Hansen Gabrielle A. Hartley Marina Haukness Kerstin Howe Michael W. Hunkapiller Chirag Jain Miten Jain Erich D. Jarvis Peter Kerpedjiev Melanie Kirsche Mikhail Kolmogorov Jonas Korlach Milinn Kremitzki Heng Li Valerie V. Maduro Tobias Marschall Ann M. Mc Cartney Jennifer McDaniel Danny E. Miller James C. Mullikin Eugene W. Myers Nathan D. Olson Benedict Paten Paul Peluso Pavel A. Pevzner David Porubský Tamara Potapova Е. И. Рогаев Jeffrey Rosenfeld Steven L. Salzberg Valérie Schneider Fritz J. Sedlazeck Kishwar Shafin Colin J. Shew Alaina Shumate Yumi Sims Arian F. A. Smit Daniela C. Soto Ivan Sović Jessica M. Storer Aaron Streets Beth A. Sullivan Françoise Thibaud‐Nissen James Torrance Justin Wagner Brian P. Walenz Aaron M. Wenger Jonathan Wood Chunlin Xiao Stephanie M. Yan Alice Young Samantha Zarate Urvashi Surti Rajiv C. McCoy Megan Y. Dennis Ivan A. Alexandrov Jennifer L. Gerton Rachel J. O’Neill Winston Timp Justin M. Zook Michael C. Schatz Evan E. Eichler Karen H. Miga Adam M. Phillippy

Abstract In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of human genome, which revolutionized field genomics. While these updates that followed effectively covered euchromatic fraction heterochromatin many other complex regions were left unfinished or erroneous. Addressing this remaining 8% Telomere-to-Telomere (T2T) has finished first truly complete 3.055 billion base pair (bp) sequence a representing largest improvement to...

10.1101/2021.05.26.445798 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-05-27

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome is key many applications, including error correction, assembly, genotyping of variants in a pangenome graph. Yet, so far, this step often prohibitively slow. We present GraphAligner, tool for aligning long reads graphs. Compared the state-of-the-art tools, GraphAligner 13x faster uses 3x less memory. When employing we find it be more than twice as accurate over 12x extant...

10.1186/s13059-020-02157-2 article EN cc-by Genome biology 2020-09-24

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome...

10.1038/s41586-022-05325-5 article EN cc-by Nature 2022-10-19

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling methods. Here we use accurate linked long reads expand 7 samples include difficult-to-map regions segmental duplications that challenging for short reads. These add more than 300,000 SNVs 50,000 insertions or deletions (indels) 16% exonic variants, many challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, 92% of the autosomal GRCh38...

10.1016/j.xgen.2022.100128 article EN cc-by Cell Genomics 2022-04-28
Glenn Hickey Jean Monlong Jana Ebler Adam M. Novak Jordan M. Eizenga and 95 more Yan Gao Haley Abel Lucinda Antonacci-Fulton Mobin Asri Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Guillaume Bourque Silvia Buonaiuto Andrew Carroll Mark Chaisson Pi-Chuan Chang Xian Chang Haoyu Cheng Justin Chu Sarah Cody Vincenza Colonna Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Daniel Doerr Peter Ebert Jana Ebler Evan E. Eichler Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Xiaowen Feng Christian Fischer Paul Flicek Giulio Formenti Adam Frankish Robert S. Fulton Shilpa Garg Erik Garrison Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Cristian Groza Andrea Guarracino Leanne Haggerty Ira M. Hall William T. Harvey Marina Haukness David Haussler Simon Heumos Kendra Hoekzema Thibaut Hourlier Kerstin Howe Miten Jain Erich D. Jarvis Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky Sergey Koren HoJoon Lee Alexandra P. Lewis Wen‐Wei Liao Shuangjia Lu Tsung-Yu Lu Julian Lucas Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Charles Markello Tobias Marschall Fergal J. Martin Ann M. Mc Cartney Jennifer McDaniel Karen H. Miga Matthew W. Mitchell Jacquelyn Mountcastle Katherine M. Munson Moses Njagi Mwaniki Maria Nattestad Sergey Nurk Hugh E. Olsen Nathan D. Olson Trevor Pesout Adam M. Phillippy Alice B. Popejoy David Porubský Pjotr Prins Daniela Puiu Mikko Rautiainen Allison Regier Arang Rhie Samuel Sacco Ashley D. Sanders Valérie Schneider

10.1038/s41587-023-01793-w article EN Nature Biotechnology 2023-05-10

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 in 41 human genomes. Approximately 85% of <2 kbp form by twin-priming during L1 retrotransposition; 80% the larger are balanced and affect twice as many nucleotides CNVs. Balanced show excess common variants, 72% flanked segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous...

10.1016/j.cell.2022.04.017 article EN cc-by-nc Cell 2022-05-01
Andrea Guarracino Silvia Buonaiuto Leonardo Gomes de Lima Tamara Potapova Arang Rhie and 95 more Sergey Koren Boris Rubinstein Christian Fischer Haley Abel Lucinda Antonacci-Fulton Mobin Asri Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Guillaume Bourque Andrew Carroll Mark Chaisson Pi-Chuan Chang Xian Chang Haoyu Cheng Justin Chu Sarah Cody Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Daniel Doerr Peter Ebert Jana Ebler Evan E. Eichler Jordan M. Eizenga Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Xiaowen Feng Paul Flicek Giulio Formenti Adam Frankish Robert S. Fulton Yan Gao Shilpa Garg Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Cristian Groza Leanne Haggerty Ira M. Hall William T. Harvey Marina Haukness David Haussler Simon Heumos Glenn Hickey Kendra Hoekzema Thibaut Hourlier Kerstin Howe Miten Jain Erich D. Jarvis Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky HoJoon Lee Alexandra P. Lewis Heng Li Wen‐Wei Liao Shuangjia Lu Tsung-Yu Lu Julian Lucas Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Charles Markello Tobias Marschall Fergal J. Martin Ann M. Mc Cartney Jennifer McDaniel Karen H. Miga Matthew W. Mitchell Jean Monlong Jacquelyn Mountcastle Katherine M. Munson Moses Njagi Mwaniki Maria Nattestad Adam M. Novak Sergey Nurk Hugh E. Olsen Nathan D. Olson Benedict Paten Trevor Pesout Alice B. Popejoy David Porubský Pjotr Prins Daniela Puiu Mikko Rautiainen Allison Regier Samuel Sacco Ashley D. Sanders

Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats extended segmental duplications 1,2 . Although resolution these regions in first complete assembly a genome—the Telomere-to-Telomere Consortium’s CHM13 (T2T-CHM13)—provided model their homology 3 , it remained unclear whether patterns were ancestral or maintained by ongoing recombination exchange. Here we show that contain...

10.1038/s41586-023-05976-y article EN cc-by Nature 2023-05-10
Coming Soon ...