ATOMO: Communication-efficient Learning via Atomic Sparsification
Value (mathematics)
DOI:
10.48550/arxiv.1806.04090
Publication Date:
2018-01-01
AUTHORS (6)
ABSTRACT
Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that are facets a general sparsification method can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. present ATOMO, framework for Given gradient, an decomposition, sparsity budget, ATOMO gives random unbiased atoms minimizing variance. show recent methods such as QSGD TernGrad special cases sparsifiying value decomposition neural networks gradients, rather than their coordinates, lead significantly faster distributed training.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....