- Single-cell and spatial transcriptomics
- Genetic Associations and Epidemiology
- Machine Learning in Bioinformatics
- Gene expression and cancer classification
- Machine Learning and Data Classification
- Bioinformatics and Genomic Networks
- Oral microbiology and periodontitis research
- Advanced Bandit Algorithms Research
- Cancer Genomics and Diagnostics
- Statistical Methods and Inference
- Epigenetics and DNA Methylation
- Atmospheric and Environmental Gas Dynamics
- GaN-based semiconductor devices and materials
- Gut microbiota and health
- Atherosclerosis and Cardiovascular Diseases
- Dental Health and Care Utilization
- Silicon Carbide Semiconductor Technologies
- Metabolomics and Mass Spectrometry Studies
- Genetic Mapping and Diversity in Plants and Animals
- Bayesian Methods and Mixture Models
- Cell Image Analysis Techniques
- Extracellular vesicles in disease
- Medical and Biological Sciences
- Spectroscopy and Chemometric Analyses
- HIV/AIDS oral health manifestations
Harvard University
2019-2025
Broad Institute
2021-2024
Carnegie Mellon University
2023-2024
Harvard University Press
2020-2023
Universität Innsbruck
2023
Stanford University
2016-2022
Palo Alto University
2018-2022
University of California, San Francisco
2021
Nippon Dental University
2021
Université Paris Cité
2021
Abstract Type 2 diabetes mellitus (T2D) is a growing health problem, but little known about its early disease stages, effects on biological processes or the transition to clinical T2D. To understand earliest stages of T2D better, we obtained samples from 106 healthy individuals and with prediabetes over approximately four years performed deep profiling transcriptomes, metabolomes, cytokines, proteomes, as well changes in microbiome. This rich longitudinal data set revealed many insights:...
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, many settings we have datasets collected under different conditions, e.g., treatment control experiment, are interested visualizing exploring patterns that specific This paper proposes method, contrastive (cPCA), which identifies low-dimensional structures enriched...
Abstract An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited budget: deep of a few cells or shallow many cells? Here we present mathematical framework which reveals that, estimating important gene properties, optimal allocation sequence at depth around one read per cell gene. Interestingly, corresponding estimator not widely-used plug-in estimator, but developed via empirical Bayes.
Aging is associated with complex molecular and cellular processes that are poorly understood. Here we leveraged the Tabula Muris Senis single-cell RNA-seq data set to systematically characterize gene expression changes during aging across diverse cell types in mouse. We identified aging-dependent genes 76 tissue-cell from 23 tissues characterized both shared tissue-cell-specific behaviors. found aging-related by multiple also change their congruently same direction most types, suggesting a...
Abstract The influence of seasons on biological processes is poorly understood. In order to identify seasonal patterns based diverse molecular data, rather than calendar dates, we performed a deep longitudinal multiomics profiling 105 individuals over 4 years. Here, report more 1000 variations in omics analytes and clinical measures. different molecules group into two major which correlate with peaks late spring fall/early winter California. are enriched for involved human such as...
The human hippocampus and prefrontal cortex play critical roles in learning cognition
Abstract Periodontal disease is a microbially-mediated inflammatory of tooth-supporting tissues that leads to bone and tissue loss around teeth. Although bacterially-mediated mechanisms alveolar destruction have been widely studied, the effects polymicrobial infection on periodontal ligament microbiome/virome not well explored. Therefore, current investigation introduced new mouse model examine (PDL) properties, changes in loss, host immune response, using shotgun sequencing. pathogens,...
Dysbiosis of the oral microbiome mediates chronic periodontal disease. Realignment microbial dysbiosis towards health may prevent Treatment with antibiotics and probiotics can modulate microbial, immunological, clinical landscape disease some success. Antibacterial peptides or bacteriocins, such as nisin, a nisin-producing probiotic, Lactococcus lactis, have not been examined in this context, yet warrant examination because their biomedical benefits eradicating biofilms pathogenic bacteria,...
Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each are available, e.g., functional annotation variants genome-wide association studies. Such information ignored by popular multiple approaches such as Benjamini-Hochberg procedure (BH). Here we introduce AdaFDR, a fast and flexible method that adaptively learns optimal p-value threshold from significantly improve detection power. On eQTL...
Abstract The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts identify disease subtypes patient comorbidity information. Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation hundreds distinct diseases in large EHR datasets. We applied ATM 282,957 UK Biobank samples, identifying 52 with heterogeneous profiles; analyses...
Abstract Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum single effects model (SuSiE) that allows causal effect sizes vary across based on a multivariate normal prior informed by empirical data. evaluated MultiSuSiE via simulations analyses 14 quantitative traits leveraging whole-genome sequencing 47k African-ancestry 94k European-ancestry...
Myasthenia gravis (MG) is etiologically associated with thymus abnormalities, but its pathology in the remains unclear. In this study, we attempt to narrow down features MG using spatial transcriptome analysis of thymoma and thymic hyperplasia samples. We find that majority thymomas are constituted by cortical region. However, small medullary region enlarged seropositive contains polygenic enrichment MG-specific germinal center structures. Neuromuscular epithelial cells, previously...
Clustering is a ubiquitous task in data science. Compared to the commonly used $k$-means clustering, $k$-medoids clustering requires cluster centers be actual points and support arbitrary distance metrics, which permits greater interpretability of structured objects. Current state-of-the-art algorithms, such as Partitioning Around Medoids (PAM), are iterative quadratic dataset size $n$ for each iteration, being prohibitively expensive large datasets. We propose BanditPAM, randomized...
Abstract With age, acquired mutations can cause clonal expansion of hematopoietic stem cells (HSC). This hematopoiesis indeterminate potential (CHIP) leads to an increased predisposition numerous diseases including blood cancer and cardiovascular disease. Here, we report multi- ancestry genome-wide association meta-analyses CHIP among 323,112 individuals (19.5% non-European; 5.3% have CHIP). We identify 15 significant regions nominate additional loci through multi-trait analyses, highlight...
Abstract The human frontal cortex and hippocampus play critical roles in learning cognition. We investigated the epigenomic 3D chromatin conformational reorganization during development of hippocampus, using more than 53,000 joint single-nucleus profiles conformation DNA methylation (sn-m3C-seq). remodeling predominantly occurs late-gestational to early-infant is temporally separated from dynamics. Neurons have a unique Domain-Dominant that different Compartment-Dominant glial cells...
We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure unique dataset, or enriched in one dataset relative other data. The generalization of standard PCA, for the setting where multiple datasets are available -- e.g. treatment and control group, mixed versus homogeneous population goal explore patterns specific datasets. conduct wide variety experiments which cPCA identifies important dataset-specific missed by...
A novel strain engineering is reported to realize enhancement-mode high electron mobility transistors (HEMTs) with ultralow specific on-resistance ( <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{R}_{\mathbf{on},\mathbf{sp}}$</tex> ) fabricated on 200 mm CMOS-compatible process platform. In this scheme, a enhancement layer deposited the access region of HEMT by low cost CVD demonstrated reduce . As comparing 100 V-rated without...
Computing the medoid of a large number points in high-dimensional space is an increasingly common operation many data science problems. We present algorithm Med-dit which uses O(n log n) distance evaluations to compute with high probability. based on connection multi-armed bandit problem. evaluate performance empirically Netflix-prize and single-cell RNA-Seq datasets, containing hundreds thousands living tens dimensions, observe 5-10x improvement over current state art. available at...
As datasets grow richer, an important challenge is to leverage the full features in data maximize number of useful discoveries while controlling for false positives. We address this problem context multiple hypotheses testing, where each hypothesis, we observe a p-value along with set specific that hypothesis. For example, genetic association studies, hypothesis tests correlation between variant and trait. have rich (e.g. its location, conservation, epigenetics etc.) which could inform how...
Abstract Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate locus analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse build cis-predicted expression models; TGFM also assigns PIPs causal variants are not gene...