Jason Miller
- Genomics and Phylogenetic Studies
- Stochastic processes and statistical mechanics
- Mathematical Dynamics and Fractals
- Geometry and complex manifolds
- RNA and protein synthesis mechanisms
- Black Holes and Theoretical Physics
- Chromosomal and Genetic Variations
- Genomics and Rare Diseases
- CRISPR and Genetic Engineering
- Theoretical and Computational Physics
- Plant Molecular Biology Research
- Cancer-related molecular mechanisms research
- Gut microbiota and health
- Markov Chains and Monte Carlo Methods
- Geometric Analysis and Curvature Flows
- Insect Resistance and Genetics
- Plant nutrient uptake and metabolism
- Noncommutative and Quantum Gravity Theories
- Traumatic Brain Injury Research
- Genetic diversity and population structure
- Insect symbiosis and bacterial influences
- Data Management and Algorithms
- Molecular Biology Techniques and Applications
- Gene expression and cancer classification
- Plant Virus Research Studies
Shepherd University
2018-2024
University of Oslo
2024
West Virginia University
2024
Hood College
2024
University of Cambridge
2013-2024
Warren Wilson College
2024
Orthopaedic Center
2022
Phoenixville Hospital
2022
West Chester University
2022
Temple University
2022
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given relatively high error rates such technologies, efficient accurate large repeats closely related haplotypes remains challenging. We address these issues with Canu, a successor Celera Assembler that is specifically designed for noisy sequences. Canu introduces support nanopore sequencing, halves depth-of-coverage requirements,...
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of variations and narrow down search casual variants for disease phenotypes. Different classes at nucleotide level involved in human diseases, including substitutions, insertions, deletions, frameshifts, non-sense mutations. Frameshifts mutations likely cause a negative effect protein function....
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes 1 deaths each year. Tenfold shotgun sequence coverage was obtained from PEST strain A. assembled into scaffolds span 278 base pairs. A total 91% genome organized in 303 scaffolds; largest scaffold 23.1 There substantial genetic variation within this strain, apparent existence two haplotypes approximately equal frequency ("dual haplotypes") fraction likely reflects outbred...
We present a draft sequence of the genome Aedes aegypti, primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times size malaria Anopheles gambiae. Nearly 50% Ae. aegypti consists transposable elements. These contribute to factor 4 6 increase in average gene length sizes intergenic regions relative An. gambiae Drosophila melanogaster. Nonetheless, chromosomal synteny generally maintained among all three insects, although conservation...
Abstract The whole-genome duplication 80 million years ago of the common ancestor salmonids (salmonid-specific fourth vertebrate duplication, Ss4R) provides unique opportunities to learn about evolutionary fate a duplicated genome in 70 extant lineages. Here we present high-quality assembly for Atlantic salmon ( Salmo salar ), and show that large genomic reorganizations, coinciding with bursts transposon-mediated repeat expansions, were crucial post-Ss4R rediploidization process. Comparisons...
News from the Inner Tube of Life A major initiative by U.S. National Institutes Health to sequence 900 genomes microorganisms that live on surfaces and orifices human body has established standardized protocols methods for such large-scale reference sequencing. By combining previously accumulated data with new data, Nelson et al. (p. 994 ) present an initial analysis 178 bacterial genomes. The sampling so far barely scratches surface microbial diversity found humans, but work provides...
Abstract Motivation: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length the variety of available paired-end protocols. Both types can complement one another a ‘hybrid’ approach to whole-genome shotgun sequencing projects, but assembly software must be modified accommodate their different characteristics. This is true even mated unmated combinations. Without special modifications, assemblers tuned for homogeneous data...
Sequencing of the bonobo genome shows that more than three per cent human is closely related to either or chimpanzee those genomes are each other. The and our species' two closest living relatives. This paper reports sequence bonobo, last ape be sequenced. Comparative genomic analyses reveal 3% these results shed light on ancestry species might eventually help us understand genetic basis phenotypes humans share with one other species. Two African apes relatives humans: (Pan troglodytes)...
Abstract Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of tick, Ixodes scapularis (Say), which vectors that cause Lyme disease, human granulocytic anaplasmosis, babesiosis diseases. The large reflects accumulation repetitive DNA, new lineages retro-transposons, gene architecture patterns resembling ancient metazoans rather pancrustaceans. Annotation scaffolds representing ∼57% genome, reveals 20,486 protein-coding genes...
A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using tools. The high cognitive load required to navigate such a workflow is detrimental hypothesis generation. Accordingly, there need for robust platform that incorporates all provides integrated search, analysis, visualization features through single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), visual analytic tool exploring...
Helicoverpa armigera and zea are major caterpillar pests of Old New World agriculture, respectively. Both, particularly H. armigera, extremely polyphagous, has developed resistance to many insecticides. Here we use comparative genomics, transcriptomics resequencing elucidate the genetic basis for their properties as pests.
The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the thaliana genome sequence and associated annotation. was conceived as framework that allows research community to develop release 'modules' integrate, analyze visualize data may reside at remote sites. current implementation provides an indexed database of core genomic information. These are made available through feature-rich web applications provide search, mining,...
The high degree of similarity between the mouse and human genomes is demonstrated through analysis sequence chromosome 16 (Mmu 16), which was obtained as part a whole-genome shotgun assembly genome. genome about 10% smaller than genome, owing to lower repetitive DNA content. Comparison structure protein-coding potential Mmu with that homologous segments identifies regions conserved synteny chromosomes (Hsa) 3, 8, 12, 16, 21, 22. Gene content order are highly syntenic blocks Of 731 predicted...
The Tasmanian devil ( Sarcophilus harrisii ) is threatened with extinction because of a contagious cancer known as Devil Facial Tumor Disease. inability to mount an immune response and reject these tumors might be caused by lack genetic diversity within dwindling population. Here we report whole-genome analysis two animals originating from extreme northwest southeast Tasmania, the maximal geographic spread, together genome tumor taken one them. A 3.3-Gb de novo assembly sequence data...
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated data set consisted 27 million sequencing reads organized pairs by virtue end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from clone libraries. quality-trimmed covered 5.3 times, which were obtained 39 times. With nearly complete DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess quality,...
Abstract Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given relatively high error rates such technologies, efficient accurate large repeats closely related haplotypes remains challenging. We address these issues with Canu, a successor Celera Assembler that is specifically designed for noisy sequences. Canu introduces support nanopore sequencing, halves depth-of-coverage...
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances sequencing technologies have led to a multitude generated for complex genomes, although many these are fragmented nature with significant fraction bases gaps. development long-read and improved software now enable generation more contiguous assemblies. By combining data from Illumina, longer PacBio...
Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. However, read-mapping approaches are inadequate examine large, diverse gene families or probe repeat-rich highly divergent genome regions. De novo sequencing and assembly of M. truncatula genomes enables near-comprehensive discovery structural variants (SVs), analysis rapidly evolving families, ultimately, construction pan-genome. Genome-wide synteny...
Abstract Motivation: We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and produce set consensus sequences rather than single sequence. Existing WGS assemblers take column-by-column approach generation, sequence which can be inconsistent with the underlying alleles, any aligned reads. Our new uses dynamic windowing approach. It detects alleles by simultaneously processing portions reads spanning region variation, assigns...
Background In order to maintain genome information accurately and relevantly, original annotations need be updated evaluated regularly. Manual reannotation of genomes is important as it can significantly reduce the propagation errors consequently diminishes time spent on mistaken research. For this reason, after five years from initial submission Entamoeba histolytica draft publication, we have re-examined 23 Mb assembly annotation predicted genes. Principal Findings The evaluation genomic...
Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease that has been neuropathologically diagnosed in brain donors exposed to repetitive head impacts, including boxers and American football, soccer, ice hockey, rugby players. CTE cannot yet be during life. In December 2015, the National Institute of Neurological Disorders Stroke awarded seven-year grant (U01NS093334) fund "Diagnostics, Imaging, Genetics Network for Objective Study Evaluation Traumatic Encephalopathy (DIAGNOSE...
Third generation sequencing technologies, with reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due high repeat content, gene family expansions, segmental tandem duplications, polyploidy. Recently, high-throughput mapping scaffolding strategies have further improved Together, these long-range technologies enable quality draft assemblies complex genomes...
Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome projects. Combinations of both may be appropriate surveys within-species genomic variation. We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert long-insert coverage. To preclude collapse tandem repeats, Alpaca relies base-call-corrected long reads contig formation. Compared to two other protocols, demonstrated the most...
The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, chikungunya. derived from an unknown number larvae unspecified strain mosquitoes. Toward improved utility research in virus transmission, we present annotated assembly genome.The genome has largest contig N50 (3.3 Mbp) any mosquito assembly, presents sequences both haplotypes most diploid genome, reveals independent null mutations...