SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

Maxima and minima Stochastic Gradient Descent Deep Neural Networks
DOI: 10.48550/arxiv.1805.07898 Publication Date: 2018-01-01
ABSTRACT
In Deep Learning, Stochastic Gradient Descent (SGD) is usually selected as a training method because of its efficiency; however, recently, problem in SGD gains research interest: sharp minima Neural Networks (DNNs) have poor generalization; especially, large-batch tends to converge minima. It becomes an open question whether escaping can improve the generalization. To answer this question, we propose SmoothOut framework smooth out DNNs and thereby nutshell, perturbs multiple copies DNN by noise injection averages these copies. Injecting noises widely used literature, but differs lots ways: (1) de-noising process applied before parameter updating; (2) strength adapted filter norm; (3) alternative interpretation on advantage injection, from perspective sharpness (4) usage uniform instead Gaussian noise. We prove that eliminate Training inefficient, further unbiased stochastic which only introduces overhead injecting per batch. An adaptive variant SmoothOut, AdaSmoothOut, also proposed variety experiments, AdaSmoothOut consistently generalization both small-batch top state-of-the-art solutions.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....