- Single-cell and spatial transcriptomics
- Gene expression and cancer classification
- Epigenetics and DNA Methylation
- Cancer-related molecular mechanisms research
- Cell Image Analysis Techniques
- Extracellular vesicles in disease
- Genomics and Chromatin Dynamics
- Cancer Genomics and Diagnostics
- Bioinformatics and Genomic Networks
- RNA Research and Splicing
- Immune cells in cancer
- Molecular Biology Techniques and Applications
- Gene Regulatory Network Analysis
- Genetic Associations and Epidemiology
- MicroRNA in disease regulation
- Genetic Mapping and Diversity in Plants and Animals
- Neuroinflammation and Neurodegeneration Mechanisms
- Microstructure and Mechanical Properties of Steels
- Advanced Decision-Making Techniques
- Genetic and phenotypic traits in livestock
- RNA and protein synthesis mechanisms
- Domain Adaptation and Few-Shot Learning
- Hydrogen embrittlement and corrosion behaviors in metals
- Bayesian Methods and Mixture Models
- Genomics and Phylogenetic Studies
Chinese University of Hong Kong
2020-2025
Stanford University
2016-2020
Broad Institute
2020
National Postdoctoral Association
2020
Northeastern University
2006-2018
University of California, San Francisco
2018
Yunnan Environmental Protection Bureau
2018
Yale University
2014-2017
Yunnan Agricultural University
2011
The University of Western Australia
1989
Mendelian randomization (MR) is a valuable tool for inferring causal relationships among wide range of traits using summary statistics from genome-wide association studies (GWASs). Existing summary-level MR methods often rely on strong assumptions, resulting in many false-positive findings. To relax ongoing research has been primarily focused accounting confounding due to pleiotropy. Here, we show that sample structure another major factor, including population stratification, cryptic...
The rapid emergence of spatial transcriptomics (ST) technologies is revolutionizing our understanding tissue architecture and biology. Although current ST methods, whether based on next-generation sequencing (seq-based approaches) or fluorescence in situ hybridization (image-based approaches), offer valuable insights, they face limitations either cellular resolution transcriptome-wide profiling. To address these limitations, we present SpatialScope, a unified approach integrating scRNA-seq...
Characterizing epigenetic heterogeneity at the cellular level is a critical problem in modern genomics era. Assays such as single cell ATAC-seq (scATAC-seq) offer an opportunity to interrogate through patterns of variability open chromatin. However, these assays exhibit technical that complicates clear classification and type identification heterogeneous populations. We present scABC, R package for unsupervised clustering single-cell data, classify scATAC-seq data discover regions chromatin...
Abstract The recent advancements in single-cell technologies, including chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, characteristics scCAS data, high dimensionality, degree sparsity and technical variation, make computational analysis challenging. Reference-guided approaches, which utilize information existing datasets, may facilitate data. Here, we present RA3 (Reference-guided Approach Analysis...
Pooled CRISPR screens allow researchers to interrogate genetic causes of complex phenotypes at the genome-wide scale and promise higher specificity sensitivity compared competing technologies. Unfortunately, two problems exist, particularly for CRISPRi/a screens: variability in guide efficiency large rare off-target effects. We present a method, CRISPhieRmix, that resolves these issues by using hierarchical mixture model with broad-tailed null distribution. show CRISPhieRmix allows more...
Abstract The single-cell multiomics technologies provide an unprecedented opportunity to study the cellular heterogeneity from different layers of transcriptional regulation. However, datasets generated these tend have high levels noise, making data analysis challenging. Here, we propose jointly semi-orthogonal nonnegative matrix factorization (JSNMF), which is a versatile toolkit for integrative transcriptomic and epigenomic profiled same cell. JSNMF enables visualization clustering cells...
Significance With the advancement in high-throughput technologies, analyzing high-dimensional data has become a common task. Dimension reduction methods have been applied to visualize and identify dominant patterns data. Confounding factors, commonly observed biological experiments, can affect performance of these methods, other downstream analysis. Here, we develop method by coupling dimension with adjustment for confounder effects. Our is able capture underlying patterns, as demonstrated...
Summary In this article, we first propose a Bayesian neighborhood selection method to estimate Gaussian Graphical Models (GGMs). We show the graph consistency of in sense that posterior probability true model converges one. When there are multiple groups data available, instead estimating networks independently for each group, joint estimation may utilize shared information among and lead improved individual network. Our is extended jointly GGMs with complex structures, including spatial...
Abstract Single-cell RNA-sequencing (scRNA-seq) is being used extensively to measure the mRNA expression of individual cells from deconstructed tissues, organs and even entire organisms generate cell atlas references, leading discoveries novel types deeper insight into biological trajectories. These massive datasets are usually collected many samples using different scRNA-seq technology platforms, including popular SMART-Seq2 (SS2) 10X platforms. Inherent heterogeneities between tissues...
Abstract Spatially resolved transcriptomics (SRT) has transformed tissue biology by linking gene expression profiles with spatial information. However, sequencing-based SRT methods aggregate signals from multiple cell types within capture locations (“spots”), masking cell-type-specific patterns. Traditional cell-type deconvolution estimate compositions spots but fail to resolve expression, limiting their ability uncover critical biological processes such as cellular interactions and...
Genetic basis of phenotypic differences in individuals is an important area biology and personalized medicine. Analysis divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) carbon sources. Differences 4NQO resistance were due amino acid the transcription factor Yrr1. Yrr1 YJM789 conferred but caused slower growth on glycerol, vice versa with S96 , indicating that alleles confer...
Unsupervised methods, including clustering are essential to the analysis of single-cell genomic data. Model-based methods under-explored in area genomics, and have advantage quantifying uncertainty result. Here we develop a model-based approach for integrative chromatin accessibility gene expression We show that combining these two types data, can achieve better separation underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed.
Abstract Unsupervised methods, such as clustering are essential to the analysis of single-cell genomic data. The most current methods designed for one data type only, RNA sequencing (scRNA-seq), ATAC (scATAC-seq) or sc-methylation alone, and a few developed integrative multiple types. multimodal sets leverages power in can deepen biological insight. In this paper, we propose coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) Our proposed coupleCoC builds upon...
Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data sc-methylation usually different powers in identifying unknown cell types through clustering. So, methods that integrate can potentially lead a better clustering performance. Here we propose couple CoC+ for integrative analysis of genomic data. is transfer...
The diagnosis of solid pseudopapillary neoplasm the pancreas (SPN) can be challenging due to potential confusion with other pancreatic neoplasms, particularly neuroendocrine tumors (NETs), using current pathological diagnostic markers. We conducted a comprehensive analysis bulk RNA sequencing data from SPNs, NETs, and normal pancreas, followed by experimental validation. This revealed an increased accumulation peroxisomes in SPNs. Moreover, we observed significant upregulation peroxisome...
The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many integration have been developed, few focus on understanding heterogeneous effects biological conditions across different cell populations Our proposed approach, scParser, models from conditions, which unveils key mechanisms by gene expression contributes to phenotypes. Notably, extended scParser pinpoints processes subpopulations that contribute disease pathogenesis....
Abstract Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single level, is receiving increasing research attention. The presence of dropouts an important characteristic scRNA-seq data that may affect performance downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend have more than those greater depths. In this study, we aimed develop method address both non-negativity constraints...
The advancement in technologies and the growth of available single-cell datasets motivate integrative analysis multiple genomic datasets. Integrative multimodal combines complementary information offered by single-omic can offer deeper insights on complex biological process. Clustering methods that identify unknown cell types are among first few steps datasets, they important for downstream built upon identified types.We propose scAMACE clustering data chromatin accessibility, gene...
Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number (∼800) no high-throughput-derived binding motifs. Computational methods capable associating known motifs to such will avoid tremendous experimental efforts and enable deeper understanding transcriptional regulatory...
Human neurodevelopment is a highly regulated biological process. In this article, we study the dynamic changes of through analysis human brain microarray data, sampled from 16 regions in 15 time periods neurodevelopment. We develop two-step inferential procedure to identify expressed and unexpressed genes detect differentially between adjacent periods. Markov Random Field (MRF) models are used efficiently utilize information embedded region similarity temporal dependency our approach....
Abstract Recent advances in spatial transcriptomics (ST) have enabled comprehensive profiling of gene expression with information the context tissue microenvironment. However, improvements resolution and scale ST data, deciphering domains precisely while ensuring efficiency scalability is still challenging. Here, we develop SGCAST, an efficient auto-encoder framework to identify domains. SGCAST adopts a symmetric graph convolutional learn aggregated latent embeddings via integrating...