Jason D. Lee

ORCID: 0000-0003-0064-7800
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Stochastic Gradient Optimization Techniques
  • Sparse and Compressive Sensing Techniques
  • Reinforcement Learning in Robotics
  • Neural Networks and Applications
  • Machine Learning and Algorithms
  • Advanced Bandit Algorithms Research
  • Model Reduction and Neural Networks
  • Statistical Methods and Inference
  • Machine Learning and ELM
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Adversarial Robustness in Machine Learning
  • Machine Learning and Data Classification
  • Topic Modeling
  • Markov Chains and Monte Carlo Methods
  • Advanced Optimization Algorithms Research
  • Gaussian Processes and Bayesian Inference
  • Natural Language Processing Techniques
  • Matrix Theory and Algorithms
  • Generative Adversarial Networks and Image Synthesis
  • Adaptive Dynamic Programming Control
  • Bayesian Modeling and Causal Inference
  • Systemic Lupus Erythematosus Research
  • Ferroelectric and Negative Capacitance Devices
  • Interconnection Networks and Systems

Princeton University
2017-2025

GS Caltex (South Korea)
2023

Harvard University
2023

Thomas Jefferson University
2023

Princeton Public Schools
2019-2021

University of Southern California
2016-2020

Kaiser Permanente
2020

Georgia Institute of Technology
2020

Southern California University for Professional Studies
2016-2019

LAC+USC Medical Center
2018-2019

We develop a general approach to valid inference after model selection. At the core of our framework is result that characterizes distribution post-selection estimator conditioned on selection event. specialize by lasso form confidence intervals for selected coefficients and test whether all relevant variables have been included in model.

10.1214/15-aos1371 article EN other-oa The Annals of Statistics 2016-04-11

Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural (NMT) model that maps source character sequence target without any segmentation. employ character-level convolutional network with max-pooling encoder reduce length representation, allowing be trained speed comparable subword-level models while capturing local regularities. Our character-to-character outperforms recently proposed baseline WMT’15...

10.1162/tacl_a_00067 article EN cc-by Transactions of the Association for Computational Linguistics 2017-12-01

We present a communication-efficient surrogate likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides to the global that can be used low-dimensional estimation, high-dimensional regularized and Bayesian inference. For provably improves upon naive averaging schemes facilitates construction of confidence intervals. leads minimax-optimal estimator with controlled communication cost. inference, form quasi-posterior distribution converges true posterior....

10.1080/01621459.2018.1429274 article EN Journal of the American Statistical Association 2018-02-05

In this paper, we study the problem of learning a shallow artificial neural network that best fits training data set. We in over-parameterized regime where numbers observations are fewer than number parameters model. show with quadratic activations, optimization landscape training, such networks, has certain favorable characteristics allow globally optimal models to be found efficiently using variety local search heuristics. This result holds for an arbitrary input/output pairs. For...

10.1109/tit.2018.2854560 article EN publisher-specific-oa IEEE Transactions on Information Theory 2018-07-10

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular effective practice. Despite recent progress proving various converge from good initial point, it remains unclear why random or arbitrary initialization suffices We prove the commonly used objective function for \textit{positive semidefinite} matrix no spurious local minima --- all must also be...

10.48550/arxiv.1605.07272 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: function and nonsmooth with simple proximal mapping. show that the resulting inherit desirable convergence behavior functions, even when search directions are computed inexactly. Many popular tailored problems arising in bioinformatics, signal processing, statistical learning special cases methods, our analysis yields new results some these methods.

10.1137/130921428 article EN SIAM Journal on Optimization 2014-01-01

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient achieves zero loss polynomial time for over-parameterized network with residual connections (ResNet). Our analysis relies on particular structure of Gram matrix induced by architecture. This allows us to show is stable throughout process and this stability implies optimality algorithm. We further extend our convolutional obtain similar...

10.48550/arxiv.1811.03804 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We consider the problem of learning structure a pairwise graphical model over continuous and discrete variables. present new for models with both variables that is amenable to learning. In previous work, authors have considered Gaussian models. Our approach natural generalization these two lines work mixed case. The penalization scheme involves novel symmetric use group-lasso norm follows naturally from particular parametrization model. Supplementary materials this paper are available online.

10.1080/10618600.2014.900500 article EN Journal of Computational and Graphical Statistics 2014-04-04

We devise a communication-efficient approach to distributed sparse regression in the high-dimensional setting. The key idea is average debiased or desparsified lasso estimators. show converges at same rate as long dataset not split across too many machines, and consistently estimates support under weaker conditions than lasso. On computational side, we propose new parallel computationally-efficient algorithm compute approximate inverse covariance required debiasing approach, when samples....

10.5555/3122009.3122014 article EN Journal of Machine Learning Research 2017-01-01

We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_jσ(\mathbf{w}^T\mathbf{Z}_j)$, in which both weights $\mathbf{w}$ output $\mathbf{a}$ are parameters to be learned. When labels outputs from teacher same architecture fixed $(\mathbf{w}^*, \mathbf{a}^*)$, we prove that Gaussian input $\mathbf{Z}$, there is spurious local minimizer. Surprisingly, presence...

10.48550/arxiv.1712.00779 preprint EN other-oa arXiv (Cornell University) 2017-01-01
Coming Soon ...