- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Chromosomal and Genetic Variations
- Machine Learning in Bioinformatics
- Algorithms and Data Compression
- Reservoir Engineering and Simulation Methods
- Bioinformatics and Genomic Networks
- Bacteriophages and microbial interactions
- Computational Drug Discovery Methods
- Natural Language Processing Techniques
- Hydraulic Fracturing and Reservoir Analysis
- RNA modifications and cancer
- Gene expression and cancer classification
- Genetics, Bioinformatics, and Biomedical Research
- Microbial Metabolic Engineering and Bioproduction
- Enhanced Oil Recovery Techniques
- Scientific Computing and Data Management
- Distributed and Parallel Computing Systems
- Glycosylation and Glycoproteins Research
- Advanced Proteomics Techniques and Applications
- DNA and Biological Computing
- Chemical Reactions and Isotopes
- Soil Geostatistics and Mapping
- Genetic Mapping and Diversity in Plants and Animals
- Cancer Mechanisms and Therapy
University of Arizona
2005-2025
University of Montana
2015-2024
University of Pittsburgh
2023
Janelia Research Campus
2010-2015
Helix (United States)
2015
ConocoPhillips (Canada)
2005-2014
ConocoPhillips (United States)
2005-2014
Howard Hughes Medical Institute
2010-2012
Genetic Information Research Institute
2012
Institute for Systems Biology
2012
The HMMER website, available at http://www.ebi.ac.uk/Tools/hmmer/, provides access to the protein homology search algorithms found in software suite. Since first release of website 2011, repertoire has been expanded include iterative algorithm, jackhmmer. continued growth target sequence databases means that traditional tabular representations significant hits can be overwhelming user. Consequently, additional ways presenting results have developed, allowing them summarised according...
Abstract Summary: Sequence database searches are an essential part of molecular biology, providing information about the function and evolutionary history proteins, RNA molecules DNA sequence elements. We present a tool for DNA/DNA comparison that is built on HMMER framework, which applies probabilistic inference methods based hidden Markov models to problem homology search. This tool, called nhmmer, enables improved detection remote homologs, has been used in combination with Dfam...
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database families repetitive DNA elements, in which each family represented by multiple sequence alignment and profile hidden Markov model (HMM). The initial release Dfam, featured the 2013 NAR Database Issue, contained 1143 found humans, was used produce more than 100 Mb additional annotation TE-derived regions human genome, with improved speed. Here, we...
Abstract Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0–3.3 releases ( https://dfam.org ) represent evolution from a proof-of-principle collection transposable element families in model organisms into community resource for broad range species, both curated uncurated datasets. In addition, since 3.0 provide auxiliary consensus protein alignments, formalized classification system to support the growing diversity represented...
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric centromeric repeats, constitute 6.2% the (189.9 megabases). Detailed maps these regions revealed multimegabase structural...
Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern set sequences. They render information contained sequence alignments or profile hidden Markov models by drawing stack letters for each position, where height corresponds at that and letter within depends on frequency position.We present new tool web server, called Skylign, which provides unified framework creating logos both models. In addition static image files, Skylign...
We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain large fraction DNA, much which is made up remnants transposable elements (TEs). Accurate annotation TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification masking also greatly simplify many downstream genome sequence analysis tasks. The commonly used TE tools RepeatMasker Censor depend homology search such as...
For 15 years the mission of PhosphoSitePlus® (PSP, https://www.phosphosite.org) has been to provide comprehensive information and tools for study mammalian post-translational modifications (PTMs). The number unique PTMs in PSP is now more than 450 000 from over 22 articles thousands MS datasets. most important areas growth are disease isoform informatics. Germline mutations associated with inherited diseases somatic cancer have added database can be viewed along quantitative on novel...
Mobile elements and repetitive genomic regions are sources of lineage-specific innovation uniquely fingerprint individual genomes. Comprehensive analyses such repeat elements, including those found in more complex the genome, require a complete, linear genome assembly. We present de novo discovery annotation T2T-CHM13 human reference genome. identified previously unknown satellite arrays, expanded catalog variants families for repeats mobile characterized classes composite repeats, located...
Abstract Motivation: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial by merging subalignments, and then polish this repeated splitting of subalignments to obtain improved final alignment. In general form-and-polish strategy consists several stages, profusion methods have been tried at every stage. We carefully investigate: (1) how utilize new algorithm for aligning alignments that optimally solves the common subproblem (2) what...
Abstract Immune checkpoint inhibitors (ICIs) have changed the treatment paradigm for many cancers but not shown benefit in prostate cancer (PCa). Chronic inflammation contributes to immunosuppressive tumor microenvironment (TME) and is associated with poor response ICIs. The primary source of inflammatory cytokine production inflammasome. Here, we identify PIM kinases as regulators inflammasome activation tumor-associated macrophages (TAMs). Analysis clinical data from a cohort naïve,...
DNA derived from transposable elements (TEs) constitutes large parts of the genomes complex eukaryotes, with major impacts not only on genomic research but also how organisms evolve and function. Although a variety methods tools have been developed to detect annotate TEs, there are as yet no standard benchmarks-that is, way measure or compare their accuracy. This lack accuracy assessment calls into question conclusions wide range that depends explicitly implicitly TE annotation. In absence...
Nucleomorphs are the remnant nuclei of algal endosymbionts that were engulfed by nonphotosynthetic host eukaryotes. These peculiar organelles found in cryptomonad and chlorarachniophyte algae, where they evolved from red green endosymbionts, respectively. Despite their independent origins, nucleomorph genomes similar size structure: both <1 million base pairs (the smallest nuclear known), comprised three chromosomes, possess subtelomeric ribosomal DNA operons. Here, we report complete...
Integrated multiomics network analysis reveals signaling profiles in lung cancer.
The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway that utilize machine learning methods to predict the presence or absence KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool incorporates models within bacterial genomic datasets. Using gene annotation and information module database, MetaPathPredict employs deep genome. can be used as...
Software for labeling biological sequences typically produces a theory-based statistic each match (the E-value) that indicates the likelihood of seeing match's score by chance. E-values accurately predict false rate comparisons random (shuffled) sequences, and thus provide reasoned mechanism setting thresholds enable high sensitivity with low expected rate. This threshold-setting strategy is challenged real which contain regions local repetition sequence complexity cause excess matches...
In biological sequences, tandem repeats consist of tens to hundreds residues a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result replication slippage. Over time, these decay so that original sharp pattern repetition is somewhat obscured, but even degenerate pose problem for sequence annotation: when two sequences both contain shared patterns similar repetition, can be false signal homology. We describe an implementation new hidden Markov model detecting shows...
The COVID-19 pandemic continues to pose a substantial threat human lives and is likely do so for years come. Despite the availability of vaccines, searching efficient small-molecule drugs that are widely available, including in low- middle-income countries, an ongoing challenge. In this work, we report results open science community effort, "Billion molecules against challenge", identify inhibitors SARS-CoV-2 or relevant receptors. Participating teams used wide variety computational methods...
We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment that to database protein sequences or profile hidden Markov models (pHMMs). BATH is built top the HMMER3 code base, and simplifies workflow pHMM-based by providing straightforward input interface easy-to-interpret output. also introduces novel frameshift-aware algorithms detect frameshift-inducing nucleotide insertions deletions (indels). matches accuracy containing no errors, produces...
" Fast is fine, but accuracy final. -- Wyatt Earp.
Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline analyze polymorphic insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies from assemblies or long-read sequencing data, genotypes these variants using short long read sets. Benchmarking on simulated real datasets reports high precision recall rates. is...
Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and subfamilies more fine-grained features that often intended to capture family history. We evaluate reliability annotation with common by assessing extent which subfamily is reproducible in replicate copies created segmental duplications human genome, homologous shared chimpanzee.We find standard methods annotate over 10% replicates as belonging different subfamilies, despite...