- Stochastic Gradient Optimization Techniques
- Sparse and Compressive Sensing Techniques
- Reinforcement Learning in Robotics
- Neural Networks and Applications
- Machine Learning and Algorithms
- Advanced Bandit Algorithms Research
- Model Reduction and Neural Networks
- Statistical Methods and Inference
- Machine Learning and ELM
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Adversarial Robustness in Machine Learning
- Machine Learning and Data Classification
- Topic Modeling
- Markov Chains and Monte Carlo Methods
- Advanced Optimization Algorithms Research
- Gaussian Processes and Bayesian Inference
- Natural Language Processing Techniques
- Matrix Theory and Algorithms
- Generative Adversarial Networks and Image Synthesis
- Adaptive Dynamic Programming Control
- Bayesian Modeling and Causal Inference
- Systemic Lupus Erythematosus Research
- Ferroelectric and Negative Capacitance Devices
- Interconnection Networks and Systems
Princeton University
2017-2025
GS Caltex (South Korea)
2023
Harvard University
2023
Thomas Jefferson University
2023
Princeton Public Schools
2019-2021
University of Southern California
2016-2020
Kaiser Permanente
2020
Georgia Institute of Technology
2020
Southern California University for Professional Studies
2016-2019
LAC+USC Medical Center
2018-2019
We develop a general approach to valid inference after model selection. At the core of our framework is result that characterizes distribution post-selection estimator conditioned on selection event. specialize by lasso form confidence intervals for selected coefficients and test whether all relevant variables have been included in model.
Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural (NMT) model that maps source character sequence target without any segmentation. employ character-level convolutional network with max-pooling encoder reduce length representation, allowing be trained speed comparable subword-level models while capturing local regularities. Our character-to-character outperforms recently proposed baseline WMT’15...
We present a communication-efficient surrogate likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides to the global that can be used low-dimensional estimation, high-dimensional regularized and Bayesian inference. For provably improves upon naive averaging schemes facilitates construction of confidence intervals. leads minimax-optimal estimator with controlled communication cost. inference, form quasi-posterior distribution converges true posterior....
In this paper, we study the problem of learning a shallow artificial neural network that best fits training data set. We in over-parameterized regime where numbers observations are fewer than number parameters model. show with quadratic activations, optimization landscape training, such networks, has certain favorable characteristics allow globally optimal models to be found efficiently using variety local search heuristics. This result holds for an arbitrary input/output pairs. For...
Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular effective practice. Despite recent progress proving various converge from good initial point, it remains unclear why random or arbitrary initialization suffices We prove the commonly used objective function for \textit{positive semidefinite} matrix no spurious local minima --- all must also be...
We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: function and nonsmooth with simple proximal mapping. show that the resulting inherit desirable convergence behavior functions, even when search directions are computed inexactly. Many popular tailored problems arising in bioinformatics, signal processing, statistical learning special cases methods, our analysis yields new results some these methods.
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient achieves zero loss polynomial time for over-parameterized network with residual connections (ResNet). Our analysis relies on particular structure of Gram matrix induced by architecture. This allows us to show is stable throughout process and this stability implies optimality algorithm. We further extend our convolutional obtain similar...
We consider the problem of learning structure a pairwise graphical model over continuous and discrete variables. present new for models with both variables that is amenable to learning. In previous work, authors have considered Gaussian models. Our approach natural generalization these two lines work mixed case. The penalization scheme involves novel symmetric use group-lasso norm follows naturally from particular parametrization model. Supplementary materials this paper are available online.
We devise a communication-efficient approach to distributed sparse regression in the high-dimensional setting. The key idea is average debiased or desparsified lasso estimators. show converges at same rate as long dataset not split across too many machines, and consistently estimates support under weaker conditions than lasso. On computational side, we propose new parallel computationally-efficient algorithm compute approximate inverse covariance required debiasing approach, when samples....
We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_jσ(\mathbf{w}^T\mathbf{Z}_j)$, in which both weights $\mathbf{w}$ output $\mathbf{a}$ are parameters to be learned. When labels outputs from teacher same architecture fixed $(\mathbf{w}^*, \mathbf{a}^*)$, we prove that Gaussian input $\mathbf{Z}$, there is spurious local minimizer. Surprisingly, presence...