Neural networks with late-phase weights
Stochastic Gradient Descent
DOI:
10.48550/arxiv.2007.12927
Publication Date:
2020-01-01
AUTHORS (6)
ABSTRACT
The largely successful method of training neural networks is to learn their weights using some variant stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved ensembling a subset in late stages learning. At end learning, obtain back single model taking spatial average weight space. To avoid incurring increased computational costs, investigate family low-dimensional late-phase models which interact multiplicatively with remaining parameters. Our results augmenting standard improves generalization established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented theoretical analysis noisy quadratic problem provides simplified picture phases network
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....