NFDI4DS | UHH-SEMS - Publication Details

Stanza: Layer Separation for Distributed Training in Deep Learning

Stanza Transfer of learning

DOI: 10.48550/arxiv.1812.10624 Publication Date: 2018-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xiaorui Wu

Hong Xu

Bo Li

Yongqiang Xiong

ABSTRACT

The parameter server architecture is prevalently used for distributed deep learning. Each worker machine in a system trains the complete model, which leads to hefty amount of network data transfer between workers and servers. We empirically observe that has non-negligible impact on training time. To tackle problem, we design new called Stanza. Stanza exploits fact many models such as convolution neural networks, most exchange attributed fully connected layers, while computation carried out convolutional layers. Thus, propose layer separation training: majority nodes just train rest layers only. Gradients parameters no longer need be exchanged across cluster, thereby substantially reducing volume. implement PyTorch evaluate its performance Azure EC2. Results show accelerates significantly over current systems: EC2 instances with Tesla V100 GPU 10Gb bandwidth example, 1.34x--13.9x faster common learning models.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Stanza: Layer Separation for Distributed Training in Deep Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....