NFDI4DS | UHH-SEMS - Publication Details

Neural networks with late-phase weights

Stochastic Gradient Descent

DOI: 10.48550/arxiv.2007.12927 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Johannes von Oswald

Seijin Kobayashi

Alexander Meulemans

Christian Henning

Benjamin F. Grewe

João Sacramento

ABSTRACT

The largely successful method of training neural networks is to learn their weights using some variant stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved ensembling a subset in late stages learning. At end learning, obtain back single model taking spatial average weight space. To avoid incurring increased computational costs, investigate family low-dimensional late-phase models which interact multiplicatively with remaining parameters. Our results augmenting standard improves generalization established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented theoretical analysis noisy quadratic problem provides simplified picture phases network

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Neural networks with late-phase weights

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....