NFDI4DS | UHH-SEMS - Publication Details

SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

Maxima and minima Stochastic Gradient Descent Deep Neural Networks

DOI: 10.48550/arxiv.1805.07898 Publication Date: 2018-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Wei Wen

Yandan Wang

Feng Yan

Cong Xu

Chunpeng Wu

Yiran Chen

Hai Li

ABSTRACT

In Deep Learning, Stochastic Gradient Descent (SGD) is usually selected as a training method because of its efficiency; however, recently, problem in SGD gains research interest: sharp minima Neural Networks (DNNs) have poor generalization; especially, large-batch tends to converge minima. It becomes an open question whether escaping can improve the generalization. To answer this question, we propose SmoothOut framework smooth out DNNs and thereby nutshell, perturbs multiple copies DNN by noise injection averages these copies. Injecting noises widely used literature, but differs lots ways: (1) de-noising process applied before parameter updating; (2) strength adapted filter norm; (3) alternative interpretation on advantage injection, from perspective sharpness (4) usage uniform instead Gaussian noise. We prove that eliminate Training inefficient, further unbiased stochastic which only introduces overhead injecting per batch. An adaptive variant SmoothOut, AdaSmoothOut, also proposed variety experiments, AdaSmoothOut consistently generalization both small-batch top state-of-the-art solutions.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....