- Gene expression and cancer classification
- Bioinformatics and Genomic Networks
- Statistical Methods and Inference
- Statistical Methods and Bayesian Inference
- Bayesian Methods and Mixture Models
- Functional Brain Connectivity Studies
- Genetic Associations and Epidemiology
- Machine Learning in Bioinformatics
- Privacy-Preserving Technologies in Data
- COVID-19 and healthcare impacts
- Gene Regulatory Network Analysis
- Healthcare professionals’ stress and burnout
- COVID-19 and Mental Health
- Health, Environment, Cognitive Aging
- Insurance, Mortality, Demography, Risk Management
- Advanced Neuroimaging Techniques and Applications
- Single-cell and spatial transcriptomics
- Advanced MRI Techniques and Applications
- Face and Expression Recognition
- Advanced Causal Inference Techniques
- Epigenetics and DNA Methylation
- Advanced Bandit Algorithms Research
- Data-Driven Disease Surveillance
- Machine Learning and Data Classification
- Markov Chains and Monte Carlo Methods
Indiana University – Purdue University Indianapolis
2023-2025
Indiana University School of Medicine
2023-2025
University of Pennsylvania
2018-2023
Weatherford College
2021
Flint Institute Of Arts
2021
Emory University
2016
University of Chicago
2010-2012
Abstract Background Otolaryngologists are among the highest risk for COVID‐19 exposure. Methods This is a cross‐sectional, survey‐based, national study evaluating academic otolaryngologists. Burnout, anxiety, distress, and depression were assessed by single‐item Mini‐Z Burnout Assessment, 7‐item Generalized Anxiety Disorder Scale, 15‐item Impact of Event 2‐item Patient Health Questionnaire, respectively. Results A total 349 physicians completed survey. Of them, 165 (47.3%) residents 212...
Abstract Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression as a natural strategy building models, but limited research conducted general patterns where multiple variables have values. Using idea by chained equations (MICE), we investigate two approaches using to impute values that can handle patterns. We compare our MICE methods with several existing simulation studies. Our results...
Nonphysician health care workers are involved in high-risk patient during the COVID-19 pandemic, placing them at high risk of mental burden. The impact this crucial population has not been studied thus far. Thus, objective study is to assess psychosocial well-being these providers.National cross-sectional online survey (no control group).Academic otolaryngology programs United States.We distributed a nonphysician departments across States. incorporated variety validated assessment tools...
Abstract Distributed health data networks (DHDNs) leverage from multiple sources or sites such as electronic records (EHRs) healthcare systems and have drawn increasing interests in recent years, they do not require sharing of subject-level hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number challenges analysis, particularly presence missing data. The current state-of-the-art methods handling incomplete pooling into central repository...
Abstract While high circulating tumor DNA (ctDNA) levels are associated with poor survival for multiple cancers, variant-specific differences in the association of ctDNA and have not been examined. Here we investigate KRAS (ctKRAS) associations overall progression-free (OS/PFS) first-line metastatic pancreatic ductal adenocarcinoma (mPDAC) patients receiving chemoimmunotherapy (“PRINCE”, NCT03214250), an independent cohort standard care (SOC) chemotherapy. For PRINCE, higher baseline plasma...
Summary Variable selection for structured covariates lying on an underlying known graph is a problem motivated by practical applications, and has been topic of increasing interest. However, most the existing methods may not be scalable to high-dimensional settings involving tens thousands variables pathways such as case in genomics studies. We propose adaptive Bayesian shrinkage approach which incorporates prior network information smoothing parameters connected graph, so that corresponding...
Abstract Motivation With the rapid development of modern technologies, massive data are available for systematic study Alzheimer’s disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding AD. To bridge this gap, we proposed novel structural Bayesian factor analysis framework (SBFA) to extract information shared by through aggregation genotyping gene expression neuroimaging phenotypes and...
Brain imaging genomics has manifested considerable potential in illuminating the genetic determinants of human brain structure and function. This propelled us to develop GIANT (Genetically Informed brAiN aTlas) that accounts for neuroanatomical variations simultaneously. Integrating voxel-wise heritability spatial proximity, clusters voxels into genetically informed regions, while retaining fundamental anatomical knowledge. Compared conventional (non-genetics) atlases, exhibits smaller...
There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose dimension reduction feature engineering. Bayesian factor models achieve such low-dimensional representation original through different sparsity-inducing priors. However, few these can efficiently incorporate information encoded by biological graphs, which already proven to be useful many analysis tasks. In this work, we propose model with novel...
Integrative clustering is a approach for multiple datasets, which provide different views of common group subjects. It enables analyzing multi-omics data jointly to, example, identify the subtypes diseases, cells, and so on, capturing complex underlying biological processes more precisely. On other hand, there has been great deal interest in incorporating prior structural knowledge on features into statistical analyses over past decade. The gene regulatory network (pathways) can potentially...
Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample at the same time. Various biclustering methods have been proposed successfully applied to analysis gene expression data. While existing many desirable features, most them are developed for continuous few efficiently handle -omics various types, example, binomial as in single nucleotide polymorphism or negative RNA-seq In addition, none utilize biological information such those from...
Support vector machine (SVM) is a popular classification method for the analysis of wide range data including big data. Many SVM methods with feature selection have been developed under frequentist regularization or Bayesian shrinkage frameworks. On other hand, importance incorporating priori known biological knowledge, such as gene pathway information which stems from regulatory network, into statistical genomic has recognized in recent years. In this article, we propose new approach that...
Abstract There is a growing body of literature on knowledge-guided statistical learning methods for analysis structured high-dimensional data (such as genomic and transcriptomic data) that can incorporate knowledge underlying networks derived from functional genomics proteomics. These have been shown to improve variable selection prediction accuracy yield more interpretable results. However, these typically use graphs extracted existing databases or rely subject matter expertise, which are...
Abstract Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple databases without sharing data. To tackle such challenges, we propose novel...
Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is great need for developing statistical and computational methods to reveal structures complex types toward biological discoveries. Here, we introduce Nebula, novel Bayesian integrative clustering analysis high dimensional multi-modal identify directly interpretable clusters associated biomarkers unified biologically plausible framework. To facilitate efficiency, variational Bayes...
Support vector machine (SVM) is a popular classification method for analysis of high dimensional data such as genomics data. Recently number linear SVM methods have been developed to achieve feature selection through either frequentist regularization or Bayesian shrinkage, but the assumption may not be plausible many real applications. In addition, recent work has demonstrated that incorporating known biological knowledge, those from functional genomics, into statistical genomic offers great...
Variable selection for structured covariates lying on an underlying known graph is a problem motivated by practical applications, and has been topic of increasing interest. However, most the existing methods may not be scalable to high dimensional settings involving tens thousands variables pathways such as case in genomics studies. We propose adaptive Bayesian shrinkage approach which incorporates prior network information smoothing parameters connected graph, so that corresponding...
Abstract Biclustering is a useful method for simultaneously grouping samples and features has been applied across various biomedical data types. However, most existing biclustering methods lack the ability to integratively analyze multi-modal such as multi-omics genome, transcriptome epigenome. Moreover, potential of leveraging biological knowledge represented by graphs, which demonstrated be beneficial in statistical tasks variable selection prediction, remains largely untapped context...
Kidney obstruction, if untreated in a timely manner, can lead to irreversible loss of renal function. A widely used technology for evaluations kidneys with suspected obstruction is diuresis renography. However, it generally very challenging radiologists who typically interpret renography data practice build high level competency due the low volume studies and insufficient training. Another challenge that there currently no gold standard detection kidney obstruction. Seeking develop...
With distinct advantages in power over behavioral phenotypes, brain imaging traits have become emerging endophenotypes to dissect molecular contributions behaviors and neuropsychiatric illnesses. Among different features, structural connectivity (i.e., connectome) which summarizes the anatomical connections between regions is one of most cutting edge while under-investigated traits; genetic influence on connectome variation remains highly elusive. Relying a landmark genetics study for young...
Existing missing data methods for functional mainly focus on reconstructing measurements along a single function—a univariate setting. Motivated by renal study, we bivariate setting, where each sampling unit is collection of two distinct component functions, one which may be missing. Specifically, propose Bayesian multiple imputation approach based latent factor model that exploits the joint changing patterns functions to allow accurate and stable given other. We further extend framework...