- Genomics and Chromatin Dynamics
- Gene expression and cancer classification
- Epigenetics and DNA Methylation
- Statistical Methods and Inference
- Single-cell and spatial transcriptomics
- Diet and metabolism studies
- Bioinformatics and Genomic Networks
- Molecular Biology Techniques and Applications
- Genetic Mapping and Diversity in Plants and Animals
- Genomics and Phylogenetic Studies
- Chromosomal and Genetic Variations
- Statistical Methods in Clinical Trials
- Advanced Proteomics Techniques and Applications
- RNA modifications and cancer
- Nutrition and Health in Aging
- Dietary Effects on Health
- Mass Spectrometry Techniques and Applications
- Cell Image Analysis Techniques
- Vitamin C and Antioxidants Research
- Bayesian Methods and Mixture Models
- Gene Regulatory Network Analysis
- Metabolism, Diabetes, and Cancer
- Advanced Statistical Methods and Models
- Retinoids in leukemia and cellular processes
- Antioxidant Activity and Oxidative Stress
Pennsylvania State University
2015-2024
Chengdu University
2022
Agilent Technologies (United States)
2017
Affiliated Hospital of North Sichuan Medical College
2016
University of Washington
2006-2012
University of Chicago
2010-2012
University of California, Berkeley
2009-2011
Guangdong General Hospital
2009
Guangdong Academy of Medical Sciences
2009
Fred Hutch Cancer Center
2007
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding histone modifications in living cells. Despite its widespread use, there are considerable differences how these experiments conducted, results scored evaluated quality, data metadata archived public use. These practices affect quality utility any global ChIP experiment. Through our experience...
Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike usual scalar measures reproducibility, our creates curve, which quantitatively assesses when are no longer consistent across replicates. Our curve fitted by copula mixture model, derive quantitative score, call...
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods assessing data reproducibility can produce misleading results because they ignore spatial features in data, such as domain structure and distance dependence. We present HiCRep, framework the of that systematically accounts these features. In particular, we introduce novel similarity measure, stratum adjusted correlation coefficient (SCC), quantifying between interaction matrices. Not only...
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one key tasks modern biology, as evidenced by Encyclopedia DNA Elements (ENCODE) Project. To this end, immunoprecipitation followed high-throughput sequencing (ChIP-seq) standard methodology. such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only sample preparation but also for computational analysis....
Hi-C is currently the most widely used assay to investigate 3D organization of genome and study its role in gene regulation, DNA replication, disease. However, experiments are costly perform involve multiple complex experimental steps; thus, accurate methods for measuring quality reproducibility data essential determine whether output should be further a study. Using real simulated data, we profile performance several recently proposed assessing population including HiCRep, GenomeDISCO,...
The spatial organization of chromatin in the nucleus has been implicated regulating gene expression. Maps high-frequency interactions between different segments have revealed topologically associating domains (TADs), within which most regulatory are thought to occur. TADs not homogeneous structural units but appear be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, identify hierarchical TADs. OnTAD reveals new biological insights role levels,...
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all relevant their projects. Systematic integrative analysis can help meet this need, and VISION project was established
Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios from different experiments can hinder our ability identify real variation raw data. Proper normalization required prior analysis gain meaningful insights. Most existing methods for standardize signals by rescaling either background regions peak...
Knowledge of locations and activities
Mass spectrometry provides a high-throughput way to identify proteins in biological samples. In typical experiment, sample are first broken into their constituent peptides. The resulting mixture of peptides is then subjected mass spectrometry, which generates thousands spectra, each characteristic its generating peptide. Here we consider the problem inferring, from these and present sample. We develop statistical approach problem, based on nested model. contrast commonly used two-stage...
Intra-species genetic variation can be used to investigate population structure, selection, and gene flow in non-model vertebrates; due the plummeting costs for genome sequencing, it is now possible small labs obtain full-genome data from their species of interest. However, those may not have easy access to, familiarity with, computational tools analyze data. We created a suite Galaxy web server aimed at handling nucleotide amino-acid polymorphisms discovered by sequencing several...
Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found DEGs detected using technologies. Integration data across these platforms potential improve power reliability of DEG detection. We propose a rank-based semi-parametric model determine information different sources apply it...
Sequencing of the T cell receptor (TCR) repertoire is a powerful tool for deeper study immune response, but unique structure this type data makes its meaningful quantification challenging. We introduce new method, Gamma-GPD spliced threshold model, to address difficulty. This biologically interpretable model captures distribution TCR repertoire, demonstrates stability across varying sequencing depths, and permits comparative analysis any number sampled individuals. apply our method several...
Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated a particular condition. One challenge this type is that sample sizes each condition are usually small, making statistical inference highly underpowered. A joint construction borrows from related structures across conditions has potential to improve power...
Abstract Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods assessing data reproducibility can produce misleading results because they ignore spatial features in data, such as domain structure and distance dependence. We present HiCRep, framework the of that systematically accounts these features. In particular, we introduce novel similarity measure, stratum adjusted correlation coefficient (SCC), quantifying between interaction matrices....
Abstract Hi-C is currently the most widely used assay to investigate 3D organization of genome and study its role in gene regulation, DNA replication, disease. However, experiments are costly perform involve multiple complex experimental steps; thus, accurate methods for measuring quality reproducibility data essential determine whether output should be further a study. Using real simulated data, we profile performance several recently proposed assessing population including HiCRep,...
Abstract Basal colonic crypt stem cells are long lived and play a role in colon homeostasis. Previous evidence has shown that high-calorie diet (HCD) enhances cell numbers expansion of the proliferative zone, an important biomarker for cancer. However, it is not clear how HCD drives dysregulation cell/colonocyte kinetics. We used human-relevant pig model developed immunofluorescence technique to detect quantify cells. Pigs (n = 8/group) were provided either standard (SD; 5% fat) or (23% 13...