NFDI4DS | UHH-SEMS - Publication Details

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Pruning

DOI: 10.48550/arxiv.2106.03795 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Melih Barsbey

Milad Sefidgaran

Murat A. Erdogdu

Gaël Richard

Umut Şimşekli

ABSTRACT

Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies illustrated that even simple pruning strategies be surprisingly effective, several theoretical shown compressible networks (in specific senses) should achieve a low generalization error. Yet, characterization of underlying cause makes amenable to such schemes is still missing. In this study, we address fundamental question reveal dynamics training algorithm has key role in obtaining Focusing our attention on stochastic gradient descent (SGD), main contribution link compressibility two recently established properties SGD: (i) size goes infinity, system converge mean-field limit, where weights behave independently, (ii) step-size/batch-size ratio, SGD iterates heavy-tailed stationary distribution. case these phenomena occur simultaneously, prove are guaranteed '$\ell_p$-compressible', errors different (magnitude, singular value, or node pruning) arbitrarily small increases. We further bounds adapted framework, which indeed confirm error will lower more Our theory numerical study various neural show ratios introduce heavy-tails, which, combination with overparametrization, result compressibility.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....