Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed learning
DOI: 10.48550/arxiv.2003.06307 Publication Date: 2020-01-01
ABSTRACT
Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due larger models and datasets. However, system scalability is limited communication becoming the performance bottleneck. Addressing this issue a prominent research topic. In paper, we provide comprehensive survey of communication-efficient distributed algorithms, focusing on both system-level algorithmic-level optimizations. We first propose taxonomy data-parallel algorithms that incorporates four primary dimensions: synchronization, architectures, compression techniques, parallelism tasks. then investigate state-of-the-art studies address problems these dimensions. also compare convergence rates different understand their speed. Additionally, conduct extensive experiments empirically various mainstream algorithms. Based our cost analysis, theoretical experimental speed comparison, readers with an understanding which are more efficient under specific environments. Our extrapolates potential directions for further
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....