Adam M. Phillippy
- Genomics and Phylogenetic Studies
- Chromosomal and Genetic Variations
- RNA and protein synthesis mechanisms
- Bacteriophages and microbial interactions
- CRISPR and Genetic Engineering
- Plant Virus Research Studies
- Genomic variations and chromosomal abnormalities
- Genomics and Chromatin Dynamics
- Genetic diversity and population structure
- Genetics, Bioinformatics, and Biomedical Research
- Mosquito-borne diseases and control
- Genomics and Rare Diseases
- Vibrio bacteria research studies
- Genetic Mapping and Diversity in Plants and Animals
- Antibiotic Resistance in Bacteria
- Microbial infections and disease research
- Algorithms and Data Compression
- Insect symbiosis and bacterial influences
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Gene expression and cancer classification
- Microbial Community Ecology and Physiology
- vaccines and immunoinformatics approaches
- RNA modifications and cancer
- Genetic and phenotypic traits in livestock
- Bioinformatics and Genomic Networks
National Institutes of Health
2016-2025
National Human Genome Research Institute
2016-2025
ORCID
2021
University of Maryland, College Park
2005-2014
Battelle
2011
University of Maryland, Baltimore
2005-2011
Technische Universität Berlin
2005
Johns Hopkins University
2005
Biotechnology Institute
2005
George Washington University
2005
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given relatively high error rates such technologies, efficient accurate large repeats closely related haplotypes remains challenging. We address these issues with Canu, a successor Celera Assembler that is specifically designed for noisy sequences. Canu introduces support nanopore sequencing, halves depth-of-coverage requirements,...
Abstract The newest version of MUMmer easily handles comparisons large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways analyze genome alignments. system is the first be released open-source software. This allows other developers contribute code base and freely redistribute code. sources are available http://www.tigr.org/software/mummer .
A fundamental question in microbiology is whether there continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this by facilitating high resolution taxonomic analysis thousands genomes from diverse phylogenetic lineages. To scale to available and beyond, we present FastANI, a new method estimate ANI using alignment-free approximate sequence mapping. FastANI accurate for...
Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling efficient clustering search of massive sequence collections. reduces large sequences sets small, representative sketches, from which global distances can be rapidly estimated. We demonstrate several use cases, including all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database using assembled or unassembled Illumina, Pacific Biosciences, Oxford...
Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...
Abstract Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in de novo to those found unassembled high-accuracy reads, Merqury estimates base-level accuracy completeness. For trios, can also evaluate haplotype-specific accuracy, completeness, phase block continuity, switch...
A critical output of metagenomic studies is the estimation abundances taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important consider both their hierarchical contexts and prediction confidence. current tools for visualizing data, however, omit distort quantitative relationships lack facility displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration relative confidences within complex...
The MUMmer system and the genome sequence aligner nucmer included within it are among most widely used alignment packages in genomics. Since last major release of version 3 2004, has been applied to many types problems including aligning whole sequences, reads a reference genome, comparing different assemblies same genome. Despite its broad utility, MUMmer3 limitations that can make difficult use for large genomes very data sets common today. In this paper we describe MUMmer4, substantially...
A human genome is sequenced and assembled de novo using a pocket-sized nanopore device. We report the sequencing assembly of reference for GM12878 Utah/Ceph cell line MinION (Oxford Nanopore Technologies) sequencer. 91.2 Gb sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection large structural variants epigenetic modifications. De reads alone yielded contiguous (NG50 ∼3 Mb). developed protocol to generate ultra-long (N50 > 100 kb,...
Abstract Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods limited in their ability to perform sequence comparisons of multiple simultaneously. Here we present the Harvest suite core-genome visualization tools rapid simultaneous analysis thousands intraspecific strains. includes Parsnp, a fast multi-aligner, Gingr, dynamic visual platform. Together they provide interactive alignments, variant calls, recombination...
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use computer time memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory original system. It has been used successfully to human mouse genomes each other, numerous smaller genomes. A module permits alignment multiple DNA sequence fragments, which proven valuable in comparison incomplete sequences. also method more...
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing update since 2009; it reflects resolution roughly 1000 issues encompasses modifications ranging from thousands single base changes to megabase-scale path reorganizations, gap closures, localization previously orphaned sequences. We developed new approach sequence generation for targeted updates used data mapping technologies haplotype...
New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects decode genomes previously unsequenced organisms. The lowest-cost can generate deep coverage most species, including mammals, in just a few days. sequence data generated by one these consist millions or billions short DNA sequences (reads) that range from 50 150 nt length. These must then be assembled de novo before genome analyses begin....
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly such into high-quality, finished sequences remains challenging. Many tools are available, but they differ greatly in terms their performance (speed, scalability, hardware requirements, acceptance newer read technologies) final output (composition assembled sequence). More importantly, it largely unclear how best assess the quality sequences. Assemblathon competitions...
After two decades of improvements, the current human reference genome (GRCh38) is most accurate and complete vertebrate ever produced. However, no single chromosome has been finished end to end, hundreds unresolved gaps persist1,2. Here we present a assembly that surpasses continuity GRCh382, along with gapless, telomere-to-telomere chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing hydatidiform mole CHM13 genome, combined complementary technologies for...
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a scaffolder that does not require...
Complete and accurate genome assemblies form the basis of most downstream genomic analyses are critical importance. Recent assembly projects have relied on a combination noisy long-read sequencing short-read sequencing, with former offering greater continuity latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi technology bridges this divide by delivering long reads (>10 kbp) high per-base accuracy (>99.9%). Here we present HiCanu,...
Abstract The MUMmer sequence alignment package is a suite of computer programs designed to detect regions homology in long biological sequences. Version 2.1 makes several improvements the package, including: increased speed and reduced memory requirements; ability handle both protein DNA sequences; multiple fragments; new algorithms for clustering together basic matches. system particularly efficient at comparing highly similar sequences, such as alternative versions fragment assemblies or...
Major advances in selection progress for cattle have been made following the introduction of genomic tools over past 10-12 years. These depend upon Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety deficiencies inaccuracies. We present new cattle, ARS-UCD1.2, based on same animal as original to facilitate transfer interpretation results obtained from earlier version, but applying combination modern de novo assembly increase...
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe Assemblathon 1 competition, which aimed to comprehensively assess state art in methods when applied current technologies. In collaborative effort, teams were asked assemble simulated Illumina HiSeq data set an unknown, diploid A total 41 assemblies from 17 different groups received. Novel haplotype aware...
Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of developing tools to fight them has been slowed by lack a high-quality genome assembly. Here we combine diverse technologies produce markedly improved, fully re-annotated AaegL5 assembly, demonstrate how it accelerates mosquito science. We anchored physical cytogenetic maps, doubled number...