- Genomics and Chromatin Dynamics
- Advanced Proteomics Techniques and Applications
- Mass Spectrometry Techniques and Applications
- Machine Learning in Bioinformatics
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Metabolomics and Mass Spectrometry Studies
- Gene expression and cancer classification
- Single-cell and spatial transcriptomics
- RNA Research and Splicing
- Bioinformatics and Genomic Networks
- Epigenetics and DNA Methylation
- Chromosomal and Genetic Variations
- RNA modifications and cancer
- Protein Structure and Dynamics
- Cancer Genomics and Diagnostics
- Cell Image Analysis Techniques
- Advanced Biosensing Techniques and Applications
- CRISPR and Genetic Engineering
- Scientific Computing and Data Management
- Cancer-related molecular mechanisms research
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Genetics, Bioinformatics, and Biomedical Research
- Genomic variations and chromosomal abnormalities
- Fungal and yeast genetics research
University of Washington
2016-2025
Seattle University
2015-2025
Human Genome Sciences (United States)
2018-2019
Center for Innovation
2015
École Nationale Supérieure des Mines de Paris
2007-2014
Institut Curie
2014
Inserm
2014
University of Massachusetts Chan Medical School
2013
The University of Queensland
2009-2011
The University of Sydney
2011
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites protein interaction domains. popular motif algorithm is now complemented by the GLAM2 which allows containing gaps. Three scanning algorithms—MAST, FIMO GLAM2SCAN—allow numerous databases discovered GLAM2. Transcription factor (including those using MEME) can be compared with in many database Tomtom. further analyzed putative function...
Abstract Summary: A motif is a short DNA or protein sequence that contributes to the biological function of in which it resides. Over past several decades, many computational methods have been described for identifying, characterizing and searching with motifs. Critical nearly any motif-based analysis pipeline ability scan database occurrences given by position-specific frequency matrix. Results: We describe Find Individual Motif Occurrences (FIMO), software tool scanning sequences motifs as...
The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA and RNA. Such encode many biological functions, their detection characterization important the study molecular interactions cell, including regulation gene expression. Since previous description 2009 Nucleic Acids Research Web Server Issue, we have added six new tools. Here describe capabilities all within suite, give advice on best use provide several case studies to illustrate how...
A common question within the context of de novo motif discovery is whether a newly discovered, putative resembles any previously discovered in an existing database. To answer this question, we define statistical measure motif-motif similarity, and describe algorithm, called Tomtom, for searching database motifs with given query motif. Experimental simulations demonstrate accuracy Tomtom's E values its effectiveness finding similar motifs.
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by ENCODE Consortium. identified highly enriched sequence motifs in most sets, revealing new and validating known ones. The motif sites (TF sites) are conserved evolutionarily show distinct footprints upon DNase...
During the past decade, new focus on genomics has highlighted a particular challenge: to integrate different views of genome that are provided by various types experimental data.This paper describes computational framework for integrating and drawing inferences from collection genome-wide measurements. Each dataset is represented via kernel function, which defines generalized similarity relationships between pairs entities, such as genes or proteins. The representation both flexible...
Motivation: Despite advances in high-throughput methods for discovering protein–protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting continued need computational to help direct experimentalists search novel interactions.
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation regulatory elements, their interrelations contain much richer for systematic elements. To uncover these and to generate an interpretable summary massive datasets Project, we apply unsupervised learning methodologies, converting dozens into discrete maps regions other elements...
Abstract Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity efficiency training prediction are also important concerns. Results: We introduce class string kernels, called mismatch for use with support vector machines (SVMs) discriminative approach to the protein classification remote...
Our current understanding of how DNA is packed in the nucleus most accurate at fine scale individual nucleosomes and large chromosome territories. However, modeling architecture intermediate ∼50 kb–10 Mb crucial for identifying functional interactions among regulatory elements their target promoters. We describe a method, Fit-Hi-C , that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly random polymer looping effect previously observed technical...
Sequence census methods like ChIP-seq now produce an unprecedented amount of genome-anchored data. We have developed integrative method to identify patterns from multiple experiments simultaneously while taking full advantage high-resolution data, discovering joint across different assay types. apply this ENCODE chromatin data for the human chronic myeloid leukemia cell line K562, including on covalent histone modifications and transcription factor binding, DNase-seq FAIRE-seq readouts open...
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods assessing data reproducibility can produce misleading results because they ignore spatial features in data, such as domain structure and distance dependence. We present HiCRep, framework the of that systematically accounts these features. In particular, we introduce novel similarity measure, stratum adjusted correlation coefficient (SCC), quantifying between interaction matrices. Not only...