MergeComp: A Compression Scheduler for Scalable Communication-Efficient Distributed Training
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Distributed, Parallel, and Cluster Computing
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
Distributed, Parallel, and Cluster Computing (cs.DC)
01 natural sciences
0105 earth and related environmental sciences
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2103.15195
Publication Date:
2021-01-01
AUTHORS (3)
ABSTRACT
Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the overhead and improve scalability. However, it has observed that in some cases may even harm performance of training. In this paper, we propose MergeComp, a scheduler optimize scalability communication-efficient It automatically schedules operations without knowledge model architectures or system parameters. We applied MergeComp nine popular algorithms. Our evaluations show can by up 3.83x losing accuracy. achieve scaling factor 99% over high-speed networks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....