Justin Wagner
- Genomics and Phylogenetic Studies
- Cancer Genomics and Diagnostics
- Genomics and Rare Diseases
- Chromosomal and Genetic Variations
- Genomics and Chromatin Dynamics
- Genomic variations and chromosomal abnormalities
- Bioinformatics and Genomic Networks
- Gut microbiota and health
- CRISPR and Genetic Engineering
- Gene expression and cancer classification
- RNA and protein synthesis mechanisms
- Molecular Biology Techniques and Applications
- Genetics, Bioinformatics, and Biomedical Research
- Evolution and Genetic Dynamics
- Law, AI, and Intellectual Property
- Cancer-related molecular mechanisms research
- RNA Research and Splicing
- RNA modifications and cancer
- vaccines and immunoinformatics approaches
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Genetic factors in colorectal cancer
- Ethics in Clinical Research
- Single-cell and spatial transcriptomics
- Cell Image Analysis Techniques
- Genetic Mapping and Diversity in Plants and Animals
National Institute of Standards and Technology
2019-2025
Material Measurement Laboratory
2019-2025
National Institute of Standards
2022-2024
University of Antwerp
2023
Information Technology Laboratory
2022
Mitre (United States)
2022
University of Alabama in Huntsville
2022
University of Pittsburgh
2022
University of Maryland, College Park
2014-2021
Research Institute for Advanced Computer Science
2017
Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...
Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands structural errors, and unlocks most complex regions human for clinical functional study. We show how this reference universally improves read mapping variant calling 3202 17 globally diverse samples sequenced with short long reads, respectively. identify hundreds variants per sample in previously unresolved regions, showcasing promise T2T-CHM13 evolutionary...
Abstract In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of human genome, which revolutionized field genomics. While these updates that followed effectively covered euchromatic fraction heterochromatin many other complex regions were left unfinished or erroneous. Addressing this remaining 8% Telomere-to-Telomere (T2T) has finished first truly complete 3.055 billion base pair (bp) sequence a representing largest improvement to...
The precisionFDA Truth Challenge V2 aimed to assess the state of art variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 call sets for one or more sequencing technologies (Illumina, PacBio HiFi, Oxford Nanopore Technologies). Submissions were evaluated following best practices benchmarking small variants updated Genome a Bottle benchmark genome stratifications. submissions included numerous...
Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome...
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling methods. Here we use accurate linked long reads expand 7 samples include difficult-to-map regions segmental duplications that challenging for short reads. These add more than 300,000 SNVs 50,000 insertions or deletions (indels) 16% exonic variants, many challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, 92% of the autosomal GRCh38...
The secondary injury cascade that is activated following traumatic brain (TBI) induces responses from multiple physiological systems, including the immune system. These are not limited to area of injury; they can also alter peripheral organs such as intestinal tract. Gut microbiota play a role in regulation cell populations and microglia activation, microbiome dysbiosis implicated dysregulation behavioral abnormalities. However, changes gut induced after acute TBI remains largely unexplored....
Abstract Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long and linked now enable us construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - Major Histocompatibility Complex (MHC). Here, we develop genome benchmark derived from for openly-consented Genome in Bottle sample HG002. assemble single contig each...
Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses pangenome structural haplotype multiple scales. apply graph decomposition PGR-TK class II major...
Abstract The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research clinical laboratories to evaluate variant detection on male X Y, we create a small benchmark set 111,725 variants for Genome Bottle HG002 reference material. We develop an active evaluation approach demonstrate reliably identifies errors challenging genomic regions across...
Abstract Background Thousands of experiments and studies use the human reference genome as a resource each year. This single genome, GRCh38, is mosaic created from small number individuals, representing very sample population. There need for genomes multiple populations to avoid potential biases. Results Here, we describe assembly annotation an Ashkenazi individual creation new, population-specific genome. more contiguous complete than latest version annotated with highly similar gene...
Abstract Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 Mbp of sequence, corrects thousands structural errors, and unlocks most complex regions human clinical functional study. Here we demonstrate how new reference universally improves read mapping variant calling for 3,202 17 globally diverse samples sequenced with short long reads, respectively. We identify hundreds novel variants per sample—a frontier evolutionary biomedical discovery. Simultaneously,...
Summary The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their pipelines submitted 64 callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices benchmarking small variants new GIAB benchmark sets genome...
Summary Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling methods. Here, we use accurate linked reads long expand the prior 7 samples include difficult-to-map regions segmental duplications that are not readily accessible short reads. Our benchmark adds more than 300,000 SNVs, 50,000 indels, 16 % exonic variants, many challenging, clinically relevant genes previously covered (e.g., PMS2 ). For HG002, 92%...
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due challenges with variant calling, representation, lack a genome-wide standard. To promote TR methods development, we create comprehensive catalog regions explore its properties across 86 samples. We then curate variants GIAB HG002 individual tandem repeat benchmark. also...
Abstract Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, developers to make informed tradeoffs when selecting hardware software. Here we describe a set “stratifications,” which are BED files that define distinct contexts throughout We these GRCh37/38 as new T2T-CHM13 reference, adding many hard-to-sequence regions...
Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these publicly available but significant effort is required for users to effectively organize, explore integrate it, limiting the utility rich resources. Effective integrative interactive visual statistical tools analyze metagenomic samples can greatly increase value researchers. We present Metaviz, a tool exploratory analysis annotated...
Abstract The repetitive nature and complexity of multiple medically important genes make them intractable to accurate analysis, despite the maturity short-read sequencing, resulting in a gap clinical applications genome sequencing. Genome Bottle Consortium has provided benchmark variant sets, but these excluded some relevant due their repetitiveness or polymorphic complexity. In this study, we characterize 273 395 challenging autosomal that have implications for medical This extended,...
Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has greatly benefited society 1, 2 . However, it still many gaps and errors, does not represent biological genome since is blend multiple individuals 3, 4 Recently, telomere-to-telomere CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line duplicate thus nearly homozygous 5 To address these limitations, Human...
Abstract The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, first to be explicitly consented for public dissemination genomic data cell lines. Here, we describe comprehensive dataset from individual, HG008, including DNA an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor line (HG008-T) normal cells duodenal tissue (HG008-N-D) (HG008-N-P). come thirteen whole...