MergeComp: A Compression Scheduler for Scalable Communication-Efficient Distributed Training

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Distributed, Parallel, and Cluster Computing 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Distributed, Parallel, and Cluster Computing (cs.DC) 01 natural sciences 0105 earth and related environmental sciences Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2103.15195 Publication Date: 2021-01-01
ABSTRACT
Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the overhead and improve scalability. However, it has observed that in some cases may even harm performance of training. In this paper, we propose MergeComp, a scheduler optimize scalability communication-efficient It automatically schedules operations without knowledge model architectures or system parameters. We applied MergeComp nine popular algorithms. Our evaluations show can by up 3.83x losing accuracy. achieve scaling factor 99% over high-speed networks.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....