Zongming Ma

ORCID: 0000-0003-2401-0177
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Sparse and Compressive Sensing Techniques
  • Random Matrices and Applications
  • Bayesian Methods and Mixture Models
  • Complex Network Analysis Techniques
  • Single-cell and spatial transcriptomics
  • Statistical Methods and Bayesian Inference
  • Functional Brain Connectivity Studies
  • Opinion Dynamics and Social Influence
  • Advanced Statistical Methods and Models
  • Mental Health Research Topics
  • Blind Source Separation Techniques
  • Cell Image Analysis Techniques
  • Image and Signal Denoising Methods
  • Advanced Graph Neural Networks
  • Face and Expression Recognition
  • Advanced Algebra and Geometry
  • Health, Environment, Cognitive Aging
  • Gene Regulatory Network Analysis
  • Graph theory and applications
  • Advanced Combinatorial Mathematics
  • Markov Chains and Monte Carlo Methods
  • Point processes and geometric inequalities
  • Extracellular vesicles in disease
  • Advanced Clustering Algorithms Research

Yale University
2017-2024

Dalhousie University
2024

University of Pennsylvania
2014-2023

California University of Pennsylvania
2013-2022

University of Chicago
2017-2020

Huntsman (United States)
2020

University of California, Davis
2016

Philadelphia University
2016

University of Illinois Urbana-Champaign
2015

University of Pittsburgh
2015

Neurobiological abnormalities associated with psychiatric disorders do not map well to existing diagnostic categories. High co-morbidity suggests dimensional circuit-level that cross diagnoses. Here we seek identify brain-based dimensions of psychopathology using sparse canonical correlation analysis in a sample 663 youths. This reveals correlated patterns functional connectivity and symptoms. We find four - mood, psychosis, fear, externalizing behavior are (r = 0.68-0.71) distinct...

10.1038/s41467-018-05317-y article EN cc-by Nature Communications 2018-07-26

Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by leading eigenvectors of covariance matrix. However, it behaves poorly when number features $p$ comparable to, or even much larger than, sample size $n$. In this paper, we propose new iterative thresholding approach for estimating subspaces in setting where are sparse. Under spiked model, find that recovers and consistently, optimally, range high-dimensional...

10.1214/13-aos1097 article EN other-oa The Annals of Statistics 2013-04-01

Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range applications. This paper considers both minimax and adaptive estimation principal subspace in high dimensional setting. Under mild technical conditions, we first establish optimal rates convergence for estimating which are sharp respect to all parameters, thus providing complete characterization difficulty problem term rate. The lower bound obtained by calculating local metric entropy...

10.1214/13-aos1178 article EN other-oa The Annals of Statistics 2013-12-01

This paper considers the noisy sparse phase retrieval problem: recovering a signal $\mathbf{x}\in\mathbb{R}^{p}$ from quadratic measurements $y_{j}=(\mathbf{a}_{j}'\mathbf{x})^{2}+\varepsilon_{j}$, $j=1,\ldots,m$, with independent sub-exponential noise $\varepsilon_{j}$. The goals are to understand effect of sparsity $\mathbf{x}$ on estimation precision and construct computationally feasible estimator achieve optimal rates adaptively. Inspired by Wirtinger Flow [IEEE Trans. Inform. Theory 61...

10.1214/16-aos1443 article EN other-oa The Annals of Statistics 2016-09-12

Summary Continuous treatments (e.g. doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the curve, or not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on curve and still allows misspecification of treatment density outcome regression. derive asymptotic properties give procedure data-driven bandwidth selection. The methods...

10.1111/rssb.12212 article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2016-09-30

Abstract The intestine is a complex organ that promotes digestion, extracts nutrients, participates in immune surveillance, maintains critical symbiotic relationships with microbiota and affects overall health 1 . intesting has length of over nine metres, along which there are differences structure function 2 localization individual cell types, type development trajectories detailed transcriptional programs probably drive these function. Here, to better understand differences, we evaluated...

10.1038/s41586-023-05915-x article EN cc-by Nature 2023-07-19

Community detection is a fundamental statistical problem in network data analysis. In this paper, we present polynomial time two-stage method that provably achieves optimal performanc...

10.5555/3122009.3153016 article EN arXiv (Cornell University) 2017-01-01

This paper considers a sparse spiked covariancematrix model in the high-dimensional setting and studies minimax estimation of covariance matrix principal subspace as well rank detection. The optimal rate convergence for estimating under spectral norm is established, which requires significantly different techniques from those other structured matrices such bandable or matrices. We also establish subspace, primary object interest component analysis. In addition, detection boundary obtained....

10.1007/s00440-014-0562-z article EN other-oa Probability Theory and Related Fields 2014-04-21

This paper studies the minimax detection of a small submatrix elevated mean in large matrix contaminated by additive Gaussian noise. To investigate tradeoff between statistical performance and computational cost from complexity-theoretic perspective, we consider sequence discretized models which are asymptotically equivalent to model. Under hypothesis that planted clique problem cannot be solved randomized polynomial time when size is smaller order than square root graph size, following...

10.1214/14-aos1300 article EN other-oa The Annals of Statistics 2015-05-15

Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax computationally tractable estimation leading sparse canonical coefficient vectors dimensions. Under Gaussian pair model, we first establish separate rates each set random variables under no structural assumption on...

10.1214/16-aos1519 article EN other-oa The Annals of Statistics 2017-10-01

Community detection is a central problem of network data analysis. Given network, the goal community to partition nodes into small number clusters, which could often help reveal interesting structures. The present paper studies in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks for misclassification proportion loss under appropriate conditions. are shown depend on degree-correction parameters, sizes and average within between connectivities an intuitive...

10.1214/17-aos1615 article EN The Annals of Statistics 2018-08-17

Abstract Although single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, feasibility cross-modal relies on existence highly correlated, a priori ‘linked’ features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), method that, through iterative coembedding, smoothing cell matching, uses information in each...

10.1038/s41587-023-01935-0 article EN cc-by Nature Biotechnology 2023-09-07

Abstract The ability to align individual cellular information from multiple experimental sources is fundamental for a systems-level understanding of biological processes. However, currently available tools are mainly designed single-cell transcriptomics matching and integration, generally rely on large number shared features across datasets cell matching. This approach underperforms when applied proteomic due the limited parameters simultaneously accessed lack markers these experiments....

10.1038/s41592-022-01709-7 article EN cc-by Nature Methods 2023-01-09

This paper considers testing a covariance matrix $\Sigma$ in the high dimensional setting where dimension $p$ can be comparable or much larger than sample size $n$. The problem of hypothesis $H_{0}:\Sigma=\Sigma_{0}$ for given $\Sigma_{0}$ is studied from minimax point view. We first characterize boundary that separates testable region non-testable by Frobenius norm when ratio between over $n$ bounded. A test based on $U$-statistic introduced and shown to rate optimal this asymptotic regime....

10.3150/12-bej455 article EN other-oa Bernoulli 2013-11-01

The distributions of the largest and smallest eigenvalues a p-variate sample covariance matrix S are great importance in statistics. Focusing on null case where nS follows standard Wishart distribution Wp(I, n), we study accuracy their scaling limits under setting: n/p → γ ∈ (0, ∞) as n ∞. here orthogonal Tracy–Widom law its reflection about origin. With carefully chosen rescaling constants, approximation to rescaled eigenvalue by limit attains order O(min(n, p)−2/3). If > 1, same is...

10.3150/10-bej334 article EN other-oa Bernoulli 2012-01-20

10.1007/s00440-020-00997-4 article EN Probability Theory and Related Fields 2020-09-25

Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these are not guaranteed achieve the optimality problem, while procedures that information theoretic limits for general parameter spaces computationally tractable. In paper, we present feasible two-stage method achieves optimal performance misclassification proportion stochastic block model under weak regularity conditions. Our procedure...

10.48550/arxiv.1505.03772 preprint EN other-oa arXiv (Cornell University) 2015-01-01

In this paper, we study community detection when observe <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> sparse networks and a high dimensional covariate matrix, all encoding the same structure among notation="LaTeX">$n$ subjects. asymptotic regime where number of features notation="LaTeX">$p$ subjects grow proportionally, derive an exact formula minimum mean square error...

10.1109/tit.2023.3238352 article EN IEEE Transactions on Information Theory 2023-01-20

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers problem estimating leading canonical directions in high-dimensional settings. Recently, under assumption that are sparse, various procedures have been proposed many applications involving massive data sets. However, there has few theoretical justification available literature. In this paper, we establish rate-optimal nonasymptotic...

10.1214/15-aos1332 article EN other-oa The Annals of Statistics 2015-09-16

Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax computationally tractable estimation leading sparse canonical coefficient vectors dimensions. First, we establish separate rates each set random variables under no structural assumption on marginal covariance matrices....

10.48550/arxiv.1409.8565 preprint EN other-oa arXiv (Cornell University) 2014-01-01

AbstractWe present a new computational approach to approximating large, noisy data table by low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways estimate thresholding parameters, which obviate need for computationally expensive cross-validation. also way sparsely initialize algorithm savings allow our outperform...

10.1080/10618600.2013.858632 article EN Journal of Computational and Graphical Statistics 2013-11-21
Coming Soon ...