NFDI4DS | UHH-SEMS - Publication Details

Tobias Marschall

ORCID: 0000-0002-9376-1030

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5067570776

Research Areas

Genomics and Phylogenetic Studies
Chromosomal and Genetic Variations
Genomics and Rare Diseases
Genomic variations and chromosomal abnormalities
RNA and protein synthesis mechanisms
Algorithms and Data Compression
Gene expression and cancer classification
Cancer Genomics and Diagnostics
DNA and Biological Computing
Machine Learning in Bioinformatics
Single-cell and spatial transcriptomics
Genetic Mapping and Diversity in Plants and Animals
Genetic Associations and Epidemiology
Genomics and Chromatin Dynamics
Molecular Biology Techniques and Applications
CRISPR and Genetic Engineering
semigroups and automata theory
Genetics, Bioinformatics, and Biomedical Research
Genetic diversity and population structure
Bioinformatics and Genomic Networks
Genome Rearrangement Algorithms
Evolution and Genetic Dynamics
Cell Image Analysis Techniques
RNA Research and Splicing
Epigenetics and DNA Methylation

Heinrich Heine University Düsseldorf
2020-2025

Düsseldorf University Hospital
2023-2025

Max Planck Institute for Informatics
2015-2021

Saarland University
2015-2020

Centrum Wiskunde & Informatica
2010-2019

Institute of Bioinformatics
2019

Helsinki Institute for Information Technology
2015

Bielefeld University
2006-2015

Max Planck Society
2013-2015

Brown University
2013

The complete sequence of a human genome

OPENALEX - Publications

Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 95 more

Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...

10.1126/science.abj6987 article EN Science 2022-03-31

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

OPENALEX - Publications

Aaron M. Wenger Paul Peluso William J. Rowell Pi-Chuan Chang Richard Hall and 23 more

10.1038/s41587-019-0217-9 article EN Nature Biotechnology 2019-08-12

Multi-platform discovery of haplotype-resolved structural variation in human genomes

OPENALEX - Publications

Mark Chaisson Ashley D. Sanders Xuefang Zhao Ankit Malhotra David Porubský and 92 more

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies human genetic diversity and disease association. Here, we apply a suite long-read, short-read, strand-specific technologies, optical mapping, variant discovery algorithms to comprehensively analyze three trios define the full spectrum variation in haplotype-resolved manner. We identify 818,054 indel (<50 bp) 27,622 SVs (≥50 per genome. also discover 156 inversions genome 58 intersect...

10.1038/s41467-018-08148-z article EN cc-by Nature Communications 2019-04-16

Whole-genome sequence variation, population structure and demographic history of the Dutch population

OPENALEX - Publications

Laurent C. Francioli Androniki Menelaou Sara L. Pulit Freerk van Dijk Pier Francesco Palamara and 79 more

10.1038/ng.3021 article EN Nature Genetics 2014-06-29

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

OPENALEX - Publications

Marta Byrska-Bishop Uday S. Evani Xuefang Zhao Anna O. Basile Haley Abel and 37 more

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. final, phase 3 release 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS resource, which now includes 602 complete trios, sequenced to depth 30X using Illumina. We performed single-nucleotide variant (SNV) short...

10.1016/j.cell.2022.08.004 article EN cc-by Cell 2022-09-01

Haplotype-resolved diverse human genomes and integrated analysis of structural variation

OPENALEX - Publications

Peter Ebert Peter A. Audano Qihui Zhu Bernardo Rodríguez–Martín David Porubský and 60 more

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% genome: 26 million base pairs) integrate all forms genetic variation, even across complex loci. identified 107,590 structural variants (SVs), which 68% were not...

10.1126/science.abf7117 article EN Science 2021-02-25

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

OPENALEX - Publications

Kishwar Shafin Trevor Pesout Ryan Lorig-Roach Marina Haukness Hugh E. Olsen and 27 more

Abstract De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks wall-clock time. To enable rapid assembly, we present Shasta, de assembler, polishing algorithms named MarginPolish HELEN. Using single PromethION sequencer our toolkit, assembled 11 highly contiguous genomes in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values 6.5× coverage reads >100 kb three flow cells per sample. Shasta produced...

10.1038/s41587-020-0503-6 article EN cc-by Nature Biotechnology 2020-05-04

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

OPENALEX - Publications

Murray Patterson Tobias Marschall Nadia Pisanti Leo van Iersel Leen Stougie and 2 more

The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of genome. resulting haplotypes, lists SNPs belonging each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, oblivious direct read information, constitute state-of-the-art. Haplotype assembly, addresses phasing directly from sequencing reads, suffers fact that reads current generation too short serve purposes...

10.1089/cmb.2014.0157 article EN Journal of Computational Biology 2015-02-06

WhatsHap: fast and accurate read-based phasing

OPENALEX - Publications

Marcel Martin Murray Patterson Shilpa Garg Sarah O. Fischer Nadia Pisanti and 3 more

Abstract Read-based phasing allows to reconstruct the haplotypes of a sample purely from sequencing reads. While is an important step for answering questions about population genetics, compound heterozygosity, and aid in clinical decision making, there has been lack accurate, usable standards-based software. WhatsHap production-ready tool highly accurate read-based phasing. It was designed beginning leverage third-generation technologies, whose long reads can span many variants are therefore...

10.1101/085050 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2016-11-02

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

OPENALEX - Publications

Mircea Cretu Stancu Markus J. van Roosmalen Ivo Renkens Marleen M. Nieboer Sjors Middelkamp and 12 more

Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse genomes two patients with congenital abnormalities using MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that long reads are superior to short regard de novo chromothripsis rearrangements. The also enable efficient phasing genetic variations, which leveraged determine...

10.1038/s41467-017-01343-4 article EN cc-by Nature Communications 2017-10-31

A robust benchmark for detection of germline large deletions and insertions

OPENALEX - Publications

Justin M. Zook Nancy F. Hansen Nathan D. Olson Lesley M. Chapman James C. Mullikin and 45 more

10.1038/s41587-020-0538-8 article EN Nature Biotechnology 2020-06-15

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

OPENALEX - Publications

David Porubský Peter Ebert Peter A. Audano Mitchell R. Vollger William T. Harvey and 16 more

Abstract Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing 1,2 with continuous long-read or high-fidelity 3 data. Employing this strategy, produced completely phased each haplotype an individual Puerto Rican descent (HG00733) in absence The assemblies accurate...

10.1038/s41587-020-0719-5 article EN cc-by Nature Biotechnology 2020-12-07

Chromosome-scale, haplotype-resolved assembly of human genomes

OPENALEX - Publications

Shilpa Garg Arkarachai Fungtammasan Andrew Carroll Mike Chou Anthony D. Schmitt and 17 more

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for either do not generate chromosome-scale phasing require pedigree information, which limits application. We present method named diploid (DipAsm) that uses long, accurate reads long-range conformation data single individuals to within 1 day. Applied four public human genomes, PGP1, HG002, NA12878 HG00733, DipAsm produced haplotype-resolved...

10.1038/s41587-020-0711-0 article EN cc-by Nature Biotechnology 2020-12-07

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

OPENALEX - Publications

Jana Ebler Peter Ebert Wayne E. Clarke Tobias Rausch Peter A. Audano and 7 more

Abstract Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In present study, we propose new algorithm, PanGenie, that leverages haplotype-resolved pangenome together -mer counts from sequencing...

10.1038/s41588-022-01043-w article EN cc-by Nature Genetics 2022-04-01

The complete sequence of a human genome

OPENALEX - Publications

Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 94 more

Abstract In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of human genome, which revolutionized field genomics. While these updates that followed effectively covered euchromatic fraction heterochromatin many other complex regions were left unfinished or erroneous. Addressing this remaining 8% Telomere-to-Telomere (T2T) has finished first truly complete 3.055 billion base pair (bp) sequence a representing largest improvement to...

10.1101/2021.05.26.445798 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-05-27

GraphAligner: rapid and versatile sequence-to-graph alignment

OPENALEX - Publications

Mikko Rautiainen Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome is key many applications, including error correction, assembly, genotyping of variants in a pangenome graph. Yet, so far, this step often prohibitively slow. We present GraphAligner, tool for aligning long reads graphs. Compared the state-of-the-art tools, GraphAligner 13x faster uses 3x less memory. When employing we find it be more than twice as accurate over 12x extant...

10.1186/s13059-020-02157-2 article EN cc-by Genome biology 2020-09-24

Semi-automated assembly of high-quality diploid human reference genomes

OPENALEX - Publications

Erich D. Jarvis Giulio Formenti Arang Rhie Andrea Guarracino Chentao Yang and 78 more

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome...

10.1038/s41586-022-05325-5 article EN cc-by Nature 2022-10-19

Benchmarking challenging small variants with linked and long reads

OPENALEX - Publications

Justin Wagner Nathan D. Olson Lindsay Harris Ziad Khan Jesse Farek and 36 more

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling methods. Here we use accurate linked long reads expand 7 samples include difficult-to-map regions segmental duplications that challenging for short reads. These add more than 300,000 SNVs 50,000 insertions or deletions (indels) 16% exonic variants, many challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, 92% of the autosomal GRCh38...

10.1016/j.xgen.2022.100128 article EN cc-by Cell Genomics 2022-04-28

Pangenome graph construction from genome alignments with Minigraph-Cactus

OPENALEX - Publications

Glenn Hickey Jean Monlong Jana Ebler Adam M. Novak Jordan M. Eizenga and 95 more

10.1038/s41587-023-01793-w article EN Nature Biotechnology 2023-05-10

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders

OPENALEX - Publications

David Porubský Wolfram Höps Hufsah Ashraf PingHsun Hsieh Bernardo Rodríguez–Martín and 20 more

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 in 41 human genomes. Approximately 85% of <2 kbp form by twin-priming during L1 retrotransposition; 80% the larger are balanced and affect twice as many nucleotides CNVs. Balanced show excess common variants, 72% flanked segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous...

10.1016/j.cell.2022.04.017 article EN cc-by-nc Cell 2022-05-01

Recombination between heterologous human acrocentric chromosomes

OPENALEX - Publications

Andrea Guarracino Silvia Buonaiuto Leonardo Gomes de Lima Tamara Potapova Arang Rhie and 95 more

Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats extended segmental duplications 1,2 . Although resolution these regions in first complete assembly a genome—the Telomere-to-Telomere Consortium’s CHM13 (T2T-CHM13)—provided model their homology 3 , it remained unclear whether patterns were ancestral or maintained by ongoing recombination exchange. Here we show that contain...

10.1038/s41586-023-05976-y article EN cc-by Nature 2023-05-10

Coming Soon ...