- Genomics and Phylogenetic Studies
- Cell Image Analysis Techniques
- Bacteriophages and microbial interactions
- RNA and protein synthesis mechanisms
- Bioinformatics and Genomic Networks
- Algorithms and Data Compression
- Gut microbiota and health
- Genetic diversity and population structure
- Animal Virus Infections Studies
- Animal testing and alternatives
- Evolution and Genetic Dynamics
- Metabolomics and Mass Spectrometry Studies
- Caching and Content Delivery
- Gene expression and cancer classification
- Chromosomal and Genetic Variations
- SARS-CoV-2 and COVID-19 Research
- Spaceflight effects on biology
- Environmental Science and Technology
- Evolution and Paleontology Studies
- Antibiotic Resistance in Bacteria
- Viral gastroenteritis research and epidemiology
- Plant and fungal interactions
- SARS-CoV-2 detection and testing
- Actinomycetales infections and treatment
- Fractal and DNA sequence analysis
Rice University
2017-2024
University of Houston
2020
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number single nucleotide polymorphisms (SNPs) that differentiate main SARS-CoV-2 lineages have spread throughout world. In study, we investigated 129 RNA-seq data sets and 6928 consensus genomes contrast intra-host inter-host diversity SARS-CoV-2. Our analyses yielded three major observations....
A common outcome of antibiotic exposure in patients and vitro is the evolution a hypermutator phenotype that enables rapid adaptation by pathogens. While hypermutation robust mechanism for adaptation, it requires trade-offs between adaptive mutations more "hitchhiker" accumulate from increased mutation rate. Using quantitative experimental evolution, we examined role driving Pseudomonas aeruginosa to colistin. Metagenomic deep sequencing revealed 2,657 at ≥5% frequency 1,197 genes 761 29...
Abstract The COVID-19 pandemic has emphasized the importance of accurate detection known and emerging pathogens. However, robust characterization pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide using taxonomic functional labels a customized set curated Functions Sequences Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding with...
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number single nucleotide polymorphisms (SNPs) that differentiate main SARS-CoV-2 clades have spread throughout world. In study, we investigated over 7,000 datasets unveil both intrahost and interhost diversity. Our diversity analyses yielded three major observations. First, mutational profile...
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with pace sequence archives has remained a challenge. In recent years, accelerated genomic availability been accompanied application wide array highly efficient from other fields field metagenomics. For instance, sketching algorithms such as MinHash have seen rapid and widespread adoption. These techniques handle increasingly large datasets...
Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant human health and disease. However, contaminant sequences in metagenomic samples can potentially impact interpretation findings reported microbiome studies, especially low-biomass environments. Contamination from DNA extraction kits or sampling lab environments leaves taxonomic "bread crumbs" across multiple distinct sample types. Here we describe Squeegee, a de novo contamination...
Viruses of concern for quantitative wastewater monitoring are usually selected as a result an outbreak and subsequent detection in wastewater. In addition, targeted metagenomics could proactively be used widespread identification sequencing viruses when initial screening tool. To evaluate the utility screening, we ViroCap, panel probes designed to target all known vertebrate viruses. Untreated was collected from treatment plants (WWTPs) building-level manholes associated with vulnerable...
Nocardia spp. are Gram-positive opportunistic pathogens that affect largely immunocompromised patients, leading to serious pulmonary or systemic infections. Combination therapy using the folate biosynthesis pathway inhibitors trimethoprim (TMP) and sulfamethoxazole (SMX) is commonly used as an antimicrobial therapy. Not surprisingly, antibiotic therapies for nocardiosis can extend many months, resistance TMP-SMX has emerged. Using experimental evolution, we surveyed genetic basis of...
With advances in synthetic biology and genome engineering comes a heightened awareness of potential misuse related to biosafety concerns. A recent study employed machine learning identify the lab-of-origin DNA sequences help mitigate some these Despite their promising results, this deep based approach had limited accuracy, was computationally expensive train, wasn't able provide precise features that were used its predictions. To address shortcomings, we developed PlasmidHawk for prediction....
Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response perturbations. We describe KOMB, a method for tracking genome level microbiomes. KOMB utilizes K-core decomposition identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV)...
Abstract When two species hybridize, one outcome is the integration of genetic material from into genome other, a process known as introgression. Detecting introgression in genomic data very important question evolutionary biology. However, given that hybridization occurs between closely related species, compli-cating factor for detection presence incomplete lineage sorting, or ILS. The D -statistic, famously referred to “ABBA-BABA” test, was pro-posed ILS sets consist four genomes. More...
DNA sequencing, especially of microbial genomes and metagenomes, has been at the core recent research advances in large-scale comparative genomics. The data deluge resulted exponential growth genomic datasets over past years shown no sign slowing down. Several attempts have made to tame computational burden sequence search on these terabyte petabyte-scale datasets, including raw reads assembled genomes. However, known implementation provides both fast query construction time, keeps low...
DNA synthesis technologies are enabling rapid advancements in the field of synthetic biology, which involves design and fabrication novel biological components.The immense promise technology is unmistakable, but so its potential for intentional or accidental misuse.In interest biosecurity, United States Department Health Human Services (HHS) issued Screening Framework Guidance Providers Synthetic Double-Stranded 2010, calls on commercial providers double-stranded (dsDNA) to voluntarily...
Rapid advancements in synthetic biology and nucleic acid synthesis, particular concerns about its intentional or accidental misuse, call for more sophisticated screening tools to identify genes of interest within short sequence fragments. One major gap predicting concern is the inadequacy current ontologies describe specific biological processes pathogenic proteins. The objective this work design software that sensitively assigns taxonomic classifications, functional annotations, nucleotide...
Abstract The rise of whole-genome shotgun sequencing (WGS) has enabled numerous breakthroughs in large-scale comparative genomics research. However, the size genomic datasets grown exponentially over last few years, leading to new challenges for traditional streaming algorithms. Modern petabyte-sized are difficult process because they delivered by high-throughput data streams and store. As a result, many problems becoming increasingly relevant. One such problem is task constructing maximally...
Abstract The COVID-19 pandemic has emphasized the importance of detecting known and emerging pathogens from clinical environmental samples. However, robust characterization pathogenic sequences remains an open challenge. To this end, we developed SeqScreen, which can accurately characterize short nucleotide using taxonomic functional labels, a customized set curated Functions Sequences Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model label...
Viruses of concern for quantitative wastewater monitoring are usually selected as a result an outbreak and subsequent detection in wastewater. However, targeted metagenomics could proactively identify viruses when used initial screening tool. To evaluate the utility screening, we ViroCap, panel probes designed to target all known vertebrate viruses. Untreated was collected from treatment plants (WWTPs) building-level manholes associated with vulnerable populations Houston, TX. We evaluated...
A systems biology approach was implemented utilizing NASA's GeneLab platform involving experiments from samples flown in space, human physiological data astronauts, and confirmation NASA Twin Study. comprehensive multi-omics correlating transcriptomics, proteomics, metabolomics, methylation analysis. We found that cells have stronger overall biological response than the tissues to spaceflight, with mitochondrial activity innate immunity pathways being heavily impacted. Study results are...
Abstract Tiled amplicon sequencing has served as an essential tool for tracking the spread and evolution of pathogens. Over 15 million complete SARS-CoV-2 genomes are now publicly available, most sequenced assembled via tiled sequencing. While computational tools design exist, they require downstream manual optimization both computationally experimentally, which is slow costly. Here we present Olivar, a first step towards fully automated, variant-aware amplicons pathogen genomes. Olivar...
Tiled amplicon sequencing has served as an essential tool for tracking the spread and evolution of pathogens. Over 2 million complete SARS-CoV-2 genomes are now publicly available, most sequenced assembled via tiled sequencing. While computational tools design exist, they require downstream manual optimization both computationally experimentally, which is slow costly. Here we present Olivar, a first step towards fully automated, variant-aware amplicons pathogen genomes. Olivar converts each...