NFDI4DS | UHH-SEMS - Publication Details

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed learning

DOI: 10.48550/arxiv.2003.06307 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Zhenheng Tang

Shaohuai Shi

Xiaowen Chu

Wei Wang

Bo Li

ABSTRACT

Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due larger models and datasets. However, system scalability is limited communication becoming the performance bottleneck. Addressing this issue a prominent research topic. In paper, we provide comprehensive survey of communication-efficient distributed algorithms, focusing on both system-level algorithmic-level optimizations. We first propose taxonomy data-parallel algorithms that incorporates four primary dimensions: synchronization, architectures, compression techniques, parallelism tasks. then investigate state-of-the-art studies address problems these dimensions. also compare convergence rates different understand their speed. Additionally, conduct extensive experiments empirically various mainstream algorithms. Based our cost analysis, theoretical experimental speed comparison, readers with an understanding which are more efficient under specific environments. Our extrapolates potential directions for further

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....