Profiling DNN Workloads on a Volta-based DGX-1 System
Profiling (computer programming)
Deep Neural Networks
Residual neural network
Limiting
DOI:
10.1109/iiswc.2018.8573521
Publication Date:
2018-12-13T19:48:21Z
AUTHORS (8)
ABSTRACT
High performance multi-GPU systems are widely used to accelerate training of deep neural networks (DNNs) by exploiting the inherently massive parallel nature process. Typically, DNNs in leverages a data-parallel model which DNN is replicated on every GPU, and each GPU performs Forward Propagation (FP), Backward (BP) and, Weight Update (WU). We analyze WU stage that composed collective communication (e.g., allReduce, broadcast), demands very efficient among GPUs avoid diminishing returns when scaling number system. To overcome this issue, different data transfer mechanisms libraries have been introduced NVIDIA, adopted high-level frameworks train DNNs. In work, we evaluate compare peer-to-peer (P2P) method NCCL library-based for DGX-1 system consisting 8 NVIDIA Volta-based GPUs. profile five popular (GoogLeNet, AlexNet, Inception-v3, ResNet LeNet) using 1, 2, 4 show breakdown time across FP+ BP provide insights about limiting factors algorithm as well identify bottlenecks architecture. Our detailed profiling analysis can help programmers designers process
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (49)
CITATIONS (27)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....