Stanza: Layer Separation for Distributed Training in Deep Learning
Stanza
Transfer of learning
DOI:
10.48550/arxiv.1812.10624
Publication Date:
2018-01-01
AUTHORS (4)
ABSTRACT
The parameter server architecture is prevalently used for distributed deep learning. Each worker machine in a system trains the complete model, which leads to hefty amount of network data transfer between workers and servers. We empirically observe that has non-negligible impact on training time. To tackle problem, we design new called Stanza. Stanza exploits fact many models such as convolution neural networks, most exchange attributed fully connected layers, while computation carried out convolutional layers. Thus, propose layer separation training: majority nodes just train rest layers only. Gradients parameters no longer need be exchanged across cluster, thereby substantially reducing volume. implement PyTorch evaluate its performance Azure EC2. Results show accelerates significantly over current systems: EC2 instances with Tesla V100 GPU 10Gb bandwidth example, 1.34x--13.9x faster common learning models.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....