NFDI4DS | UHH-SEMS - Publication Details

Profiling DNN Workloads on a Volta-based DGX-1 System

Profiling (computer programming) Deep Neural Networks Residual neural network Limiting

DOI: 10.1109/iiswc.2018.8573521 Publication Date: 2018-12-13T19:48:21Z

Abstract Supplemental Material References Cited by

AUTHORS (8)

Saiful A. Mojumder

Marcia S Louis

Yifan Sun

Amir Kavyan Ziabari

Jose L. Abellan

John Kim

David Kaeli

Ajay Joshi

ABSTRACT

High performance multi-GPU systems are widely used to accelerate training of deep neural networks (DNNs) by exploiting the inherently massive parallel nature process. Typically, DNNs in leverages a data-parallel model which DNN is replicated on every GPU, and each GPU performs Forward Propagation (FP), Backward (BP) and, Weight Update (WU). We analyze WU stage that composed collective communication (e.g., allReduce, broadcast), demands very efficient among GPUs avoid diminishing returns when scaling number system. To overcome this issue, different data transfer mechanisms libraries have been introduced NVIDIA, adopted high-level frameworks train DNNs. In work, we evaluate compare peer-to-peer (P2P) method NCCL library-based for DGX-1 system consisting 8 NVIDIA Volta-based GPUs. profile five popular (GoogLeNet, AlexNet, Inception-v3, ResNet LeNet) using 1, 2, 4 show breakdown time across FP+ BP provide insights about limiting factors algorithm as well identify bottlenecks architecture. Our detailed profiling analysis can help programmers designers process

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (49)

CITATIONS (27)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Profiling DNN Workloads on a Volta-based DGX-1 System

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....