NFDI4DS | UHH-SEMS - Publication Details

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Training set

DOI: 10.48550/arxiv.1807.11205 Publication Date: 2018-01-01

Abstract Supplemental Material References Cited by

AUTHORS (14)

Xianyan Jia

Shutao Song

Wei He

Yangzihao Wang

Haidong Rong

Feihu Zhou

Liqiang Xie

Zhenyu Guo

Yuanzhou Yang

Li-Wei Yu

Tiegang Chen

Guangxiao Hu

Shaohuai Shi

Xiaowen Chu

ABSTRACT

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing communication-to-computation ratio, it may hurt generalization ability of models. To this end, we build a highly scalable learning for dense GPU clusters three main contributions: (1) We propose mixed-precision method that significantly improves throughput single without losing accuracy. (2) an optimization approach extremely large size (up to 64k) train CNN models on ImageNet dataset (3) optimized all-reduce algorithms achieve up 3x and 11x speedup AlexNet ResNet-50 respectively than NCCL-based cluster 1024 Tesla P40 GPUs. On 90 epochs, state-of-the-art GPU-based P100 GPUs spent 15 minutes achieved 74.9\% top-1 test accuracy, another KNL-based 2048 Intel KNLs 20 75.4\% Our 75.8\% accuracy only 6.6 When 95 our 58.7\% within 4 minutes, which also outperforms all other existing systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....