High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

FOS: Computer and information sciences Computer Science - Machine Learning Statistics - Machine Learning Computer Science - Data Structures and Algorithms 0202 electrical engineering, electronic engineering, information engineering Data Structures and Algorithms (cs.DS) Machine Learning (stat.ML) 02 engineering and technology Machine Learning (cs.LG)
DOI: 10.48550/arxiv.1902.10710 Publication Date: 2019-01-01
ABSTRACT
this is a follow-up to and has minor text overlap with arXiv:1812.09859; v2: minor revision following acceptance for presentation at COLT 2019<br/>Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional $\sqrt{n}$ factor in the bound. Specifically, their bound on the estimation error of any $��$-uniformly stable learning algorithm on $n$ samples and range in $[0,1]$ is $O(��\sqrt{n \log(1/��)} + \sqrt{\log(1/��)/n})$ with probability $\geq 1-��$. The $\sqrt{n}$ overhead makes the bound vacuous in the common settings where $��\geq 1/\sqrt{n}$. A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most $O(n^{1/4})$. Still, both of these results give optimal generalization bounds only when $��= O(1/n)$. We prove a nearly tight bound of $O(��\log(n)\log(n/��) + \sqrt{\log(1/��)/n})$ on the estimation error of any $��$-uniformly stable algorithm. It implies that for algorithms that are uniformly stable with $��= O(1/\sqrt{n})$, estimation error is essentially the same as the sampling error. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate --- resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....