- Genomics and Chromatin Dynamics
- Genomics and Phylogenetic Studies
- Epigenetics and DNA Methylation
- RNA and protein synthesis mechanisms
- Gene expression and cancer classification
- Tuberculosis Research and Epidemiology
- Bioinformatics and Genomic Networks
- Cell Image Analysis Techniques
- Single-cell and spatial transcriptomics
- Mycobacterium research and diagnosis
- Image Processing Techniques and Applications
- Machine Learning in Bioinformatics
- Cancer Genomics and Diagnostics
- Chromosomal and Genetic Variations
- RNA modifications and cancer
- Gene Regulatory Network Analysis
- Scientific Computing and Data Management
- Digital Imaging for Blood Diseases
- Tensor decomposition and applications
- Genetic Mapping and Diversity in Plants and Animals
- Algorithms and Data Compression
- Machine Learning and Algorithms
- Protein Degradation and Inhibitors
- RNA Research and Splicing
- vaccines and immunoinformatics approaches
Simon Fraser University
2018-2024
University of Washington
2014-2023
University of British Columbia
2022
Seattle University
2014-2018
California Institute of Technology
2006
A large collection of new modENCODE and ENCODE genome-wide chromatin data sets from cell lines developmental stages in worm, fly human are analysed; this reveals many conserved features organization among the three organisms, as well notable differences composition locations repressive chromatin. This study describes numerous Homo sapiens, Drosophila melanogaster Caenorhabditis elegans generated by consortia. The results point to while identifying Genome function is dynamically regulated...
Abstract The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands datasets but cannot possibly measure each epigenomic factor all types. To address this, we present a method, PaRallel Data Imputation Cloud-based Tensor Decomposition (PREDICTD), computationally impute missing...
The genomic neighborhood of a gene influences its activity, behavior that is attributable in part to domain-scale regulation. Previous studies have identified many types regulatory domains. However, due the difficulty integrating genomics data sets, relationships among these domain are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation heterogeneous collections by simultaneously partitioning and assigning labels resulting segments. existing...
Abstract Recently, Hi-C has been used to probe the 3D chromatin architecture of multiple organisms and cell types. The resulting collections pairwise contacts across genome have connected many cellular phenomena, including replication timing gene regulation. However, high resolution (10 kb or finer) contact maps remain scarce due expense time required for collection. A computational method predicting without need run a experiment would be invaluable in understanding role that plays biology....
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds cell and tissue types. Chromatin state annotations produced by segmentation genome annotation (SAGA) methods emerged as predominant way to summarize these epigenomic data sets order annotate genome. These chromatin are essential for many tasks, including identifying active regulatory elements interpreting disease-associated genetic variation. However, despite widespread...
Eukaryotic genome duplication starts at discrete sequences (replication origins) that coordinate cell cycle progression, ensure genomic stability and modulate gene expression. Origins share some sequence features, but their activity also responds to changes in transcription cellular differentiation status. To identify chromatin states histone modifications locally mark replication origins, we profiled origin distributions eight human lines representing embryonic differentiated types....
Semi-automated genome annotation methods such as Segway take input a set of genome-wide measurements histone modification or DNA accessibility and output an genomic activity in the target cell type. Here we present annotations 164 human types using 1615 data sets. To produce these annotations, automated label interpretation step to fully strategy. Using developed measure importance each position called "conservation-associated score." We further combined all into single, type-agnostic...
Large-scale epigenomic datasets such as histone modifications and DNA accessibility have greatly advanced our understanding of genomic function. However, these measurements often suffer from noise, batch effects irreproducibility. Epigenome imputation has emerged a promising solution to challenges. These methods integrate patterns across experiments, cell types, loci predict the results yielding predictions that surpass observed data in quality. Thus, researchers increasingly leverage for...
Understanding the mechanistic basis of genetic disease requires annotating regulatory elements in human genome. To this end, International Human Epigenome Consortium (IHEC) has generated thousands epigenomic datasets--including ChIP-seq, DNase-seq, and ATAC-seq--that measure various biochemical activities genome, including transcription factor binding, histone modification, DNA accessibility. Currently, predominant methods for integrating these data sets to annotate are segmentation genome...
Segway performs semi-automated genome annotation, discovering joint patterns across multiple genomic signal datasets. We discuss a major new version of and highlight its ability to model data with substantially greater accuracy. Major enhancements in 2.0 include the mixture Gaussians, enabling capture arbitrarily complex distributions, minibatch training, leading better learned parameters.Segway source code are freely available for download at http://segway.hoffmanlab.org. have made scripts...
The occurrence of multiple strains a bacterial pathogen such as M. tuberculosis or C. difficile within single human host, referred to mixed infection, has important implications for both healthcare and public health. However, methods detecting it, especially determining the proportion identities underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, novel method addressing these challenges. Grounded in rigorous statistical...
Due to the high cost of sequencing-based genomics assays such as ChIP-seq and DNase-seq, epigenomic characterization a cell type is typically carried out using small panel assay types. Deciding priori which perform is, thus, critical step in many studies. We present submodular selection (SSA), method for choosing diverse genomic that leverages methods from optimization. More generally, this application serves model how optimization can be applied other discrete problems biology.
Abstract Selecting a non‐redundant representative subset of sequences is common step in many bioinformatics workflows, such as the creation training sets for sequence and structural models or selection “operational taxonomic units” from metagenomics data. Previous methods this task, CD‐HIT, PISCES, UCLUST, apply heuristic threshold‐based algorithm that has no theoretical guarantees. We propose new approach based on submodular optimization. Submodular optimization, discrete analogue to...
Despite the availability of chromatin conformation capture experiments, discerning relationship between 1D genome and 3D remains a challenge, which limits our understanding their affect on gene expression disease. We propose Hi-C-LSTM, method that produces low-dimensional latent representations summarize intra-chromosomal Hi-C contacts via recurrent long short-term memory neural network model. find these contain all information needed to recreate observed matrix with high accuracy,...
We present a laboratory demonstration of the Kramers-Kronig relation between resonant absorption and refractive index in rubidium gas. Our experiment uses vapor cell one arm simple Mach-Zehnder interferometer. As laser frequency is scanned over an atomic resonance, interferometer output affected by variations both gas with frequency, all which can be calculated straightforward manner. Changing density phase produces family different signals. The was performed using commercially available...
Segmentation and genome annotation (SAGA) algorithms are widely used to understand activity gene regulation. These methods take as input a set of sequencing-based assays epigenomic activity, such ChIP-seq measurements histone modification transcription factor binding. They output an the that assigns chromatin state label each genomic position. Existing SAGA have several limitations caused by discrete framework: annotations cannot easily represent varying strengths elements, they...
Drug repurposing can accelerate the identification of effective compounds for clinical use against SARS-CoV-2, with advantage pre-existing safety data and an established supply chain. RNA viruses such as SARS-CoV-2 manipulate cellular pathways induce reorganization subcellular structures to support their life cycle. These morphological changes be quantified using bioimaging techniques. In this work, we developed DEEMD: a computational pipeline deep neural network models within multiple...
Prediction of drug resistance and identification its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent is a challenging problem. Solving this problem requires transparent, accurate, flexible predictive model. The methods currently used for purpose rarely satisfy all these criteria. On one hand, approaches based on testing strains against catalogue previously identified mutations often yield poor performance; other machine learning techniques typically have...
Abstract Semi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present state annotations 164 human cell types using 1,615 genomics data sets. To produce these annotations, developed a fully-automated strategy in which train separate unsupervised models on each type and use machine learning classifier to automate the interpretation step. Using measure importance genomic position called “conservation-associated activity score,” aggregate...
Abstract Semi-automated genome annotation (SAGA) methods are widely used to understand activity and gene regulation. These take as input a set of sequencing-based assays epigenomic (such ChIP-seq measurements histone modification transcription factor binding), output an the that assigns chromatin state label each genomic position. Existing SAGA have several limitations caused by discrete framework: such annotations cannot easily represent varying strengths elements, they combinatorial...
Drug repurposing can accelerate the identification of effective compounds for clinical use against SARS-CoV-2, with advantage pre-existing safety data and an established supply chain. RNA viruses such as SARS-CoV-2 manipulate cellular pathways induce reorganization subcellular structures to support their life cycle. These morphological changes be quantified using bioimaging techniques. In this work, we developed DEEMD: a computational pipeline deep neural network models within multiple...
Abstract Motivation Prediction of drug resistance and identification its mechanisms in bacteria such as Mycobacterium tuberculosis , the etiological agent tuberculosis, is a challenging problem. Solving this problem requires transparent, accurate, flexible predictive model. The methods currently used for purpose rarely satisfy all these criteria. On one hand, approaches based on testing strains against catalogue previously identified mutations often yield poor performance; other machine...