- Statistical Methods and Inference
- Sparse and Compressive Sensing Techniques
- Random Matrices and Applications
- Bayesian Methods and Mixture Models
- Complex Network Analysis Techniques
- Single-cell and spatial transcriptomics
- Statistical Methods and Bayesian Inference
- Functional Brain Connectivity Studies
- Opinion Dynamics and Social Influence
- Advanced Statistical Methods and Models
- Mental Health Research Topics
- Blind Source Separation Techniques
- Cell Image Analysis Techniques
- Image and Signal Denoising Methods
- Advanced Graph Neural Networks
- Face and Expression Recognition
- Advanced Algebra and Geometry
- Health, Environment, Cognitive Aging
- Gene Regulatory Network Analysis
- Graph theory and applications
- Advanced Combinatorial Mathematics
- Markov Chains and Monte Carlo Methods
- Point processes and geometric inequalities
- Extracellular vesicles in disease
- Advanced Clustering Algorithms Research
Yale University
2017-2024
Dalhousie University
2024
University of Pennsylvania
2014-2023
California University of Pennsylvania
2013-2022
University of Chicago
2017-2020
Huntsman (United States)
2020
University of California, Davis
2016
Philadelphia University
2016
University of Illinois Urbana-Champaign
2015
University of Pittsburgh
2015
Neurobiological abnormalities associated with psychiatric disorders do not map well to existing diagnostic categories. High co-morbidity suggests dimensional circuit-level that cross diagnoses. Here we seek identify brain-based dimensions of psychopathology using sparse canonical correlation analysis in a sample 663 youths. This reveals correlated patterns functional connectivity and symptoms. We find four - mood, psychosis, fear, externalizing behavior are (r = 0.68-0.71) distinct...
Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by leading eigenvectors of covariance matrix. However, it behaves poorly when number features $p$ comparable to, or even much larger than, sample size $n$. In this paper, we propose new iterative thresholding approach for estimating subspaces in setting where are sparse. Under spiked model, find that recovers and consistently, optimally, range high-dimensional...
Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range applications. This paper considers both minimax and adaptive estimation principal subspace in high dimensional setting. Under mild technical conditions, we first establish optimal rates convergence for estimating which are sharp respect to all parameters, thus providing complete characterization difficulty problem term rate. The lower bound obtained by calculating local metric entropy...
This paper considers the noisy sparse phase retrieval problem: recovering a signal $\mathbf{x}\in\mathbb{R}^{p}$ from quadratic measurements $y_{j}=(\mathbf{a}_{j}'\mathbf{x})^{2}+\varepsilon_{j}$, $j=1,\ldots,m$, with independent sub-exponential noise $\varepsilon_{j}$. The goals are to understand effect of sparsity $\mathbf{x}$ on estimation precision and construct computationally feasible estimator achieve optimal rates adaptively. Inspired by Wirtinger Flow [IEEE Trans. Inform. Theory 61...
Summary Continuous treatments (e.g. doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the curve, or not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on curve and still allows misspecification of treatment density outcome regression. derive asymptotic properties give procedure data-driven bandwidth selection. The methods...
Abstract The intestine is a complex organ that promotes digestion, extracts nutrients, participates in immune surveillance, maintains critical symbiotic relationships with microbiota and affects overall health 1 . intesting has length of over nine metres, along which there are differences structure function 2 localization individual cell types, type development trajectories detailed transcriptional programs probably drive these function. Here, to better understand differences, we evaluated...
Community detection is a fundamental statistical problem in network data analysis. In this paper, we present polynomial time two-stage method that provably achieves optimal performanc...
This paper considers a sparse spiked covariancematrix model in the high-dimensional setting and studies minimax estimation of covariance matrix principal subspace as well rank detection. The optimal rate convergence for estimating under spectral norm is established, which requires significantly different techniques from those other structured matrices such bandable or matrices. We also establish subspace, primary object interest component analysis. In addition, detection boundary obtained....
This paper studies the minimax detection of a small submatrix elevated mean in large matrix contaminated by additive Gaussian noise. To investigate tradeoff between statistical performance and computational cost from complexity-theoretic perspective, we consider sequence discretized models which are asymptotically equivalent to model. Under hypothesis that planted clique problem cannot be solved randomized polynomial time when size is smaller order than square root graph size, following...
Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax computationally tractable estimation leading sparse canonical coefficient vectors dimensions. Under Gaussian pair model, we first establish separate rates each set random variables under no structural assumption on...
Community detection is a central problem of network data analysis. Given network, the goal community to partition nodes into small number clusters, which could often help reveal interesting structures. The present paper studies in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks for misclassification proportion loss under appropriate conditions. are shown depend on degree-correction parameters, sizes and average within between connectivities an intuitive...
Abstract Although single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, feasibility cross-modal relies on existence highly correlated, a priori ‘linked’ features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), method that, through iterative coembedding, smoothing cell matching, uses information in each...
Abstract The ability to align individual cellular information from multiple experimental sources is fundamental for a systems-level understanding of biological processes. However, currently available tools are mainly designed single-cell transcriptomics matching and integration, generally rely on large number shared features across datasets cell matching. This approach underperforms when applied proteomic due the limited parameters simultaneously accessed lack markers these experiments....
This paper considers testing a covariance matrix $\Sigma$ in the high dimensional setting where dimension $p$ can be comparable or much larger than sample size $n$. The problem of hypothesis $H_{0}:\Sigma=\Sigma_{0}$ for given $\Sigma_{0}$ is studied from minimax point view. We first characterize boundary that separates testable region non-testable by Frobenius norm when ratio between over $n$ bounded. A test based on $U$-statistic introduced and shown to rate optimal this asymptotic regime....
The distributions of the largest and smallest eigenvalues a p-variate sample covariance matrix S are great importance in statistics. Focusing on null case where nS follows standard Wishart distribution Wp(I, n), we study accuracy their scaling limits under setting: n/p → γ ∈ (0, ∞) as n ∞. here orthogonal Tracy–Widom law its reflection about origin. With carefully chosen rescaling constants, approximation to rescaled eigenvalue by limit attains order O(min(n, p)−2/3). If > 1, same is...
Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these are not guaranteed achieve the optimality problem, while procedures that information theoretic limits for general parameter spaces computationally tractable. In paper, we present feasible two-stage method achieves optimal performance misclassification proportion stochastic block model under weak regularity conditions. Our procedure...
In this paper, we study community detection when observe <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> sparse networks and a high dimensional covariate matrix, all encoding the same structure among notation="LaTeX">$n$ subjects. asymptotic regime where number of features notation="LaTeX">$p$ subjects grow proportionally, derive an exact formula minimum mean square error...
Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers problem estimating leading canonical directions in high-dimensional settings. Recently, under assumption that are sparse, various procedures have been proposed many applications involving massive data sets. However, there has few theoretical justification available literature. In this paper, we establish rate-optimal nonasymptotic...
Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax computationally tractable estimation leading sparse canonical coefficient vectors dimensions. First, we establish separate rates each set random variables under no structural assumption on marginal covariance matrices....
AbstractWe present a new computational approach to approximating large, noisy data table by low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways estimate thresholding parameters, which obviate need for computationally expensive cross-validation. also way sparsely initialize algorithm savings allow our outperform...