NFDI4DS | UHH-SEMS - Publication Details

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Lossy compression

DOI: 10.1145/3650200.3656636 Publication Date: 2024-06-03T18:11:54Z

Abstract Supplemental Material References Cited by

AUTHORS (14)

Jiajun Huang

Sheng Di

Xiaodong Yu

Yujia Zhai

Jinyang Liu

Yafan Huang

Ken Raffenetti

Hui Zhou

Kai Zhao

Xiaoyi Lu

Zizhong Chen

Franck Cappello

Yanfei Guo

Rajeev Thakur

ABSTRACT

GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU power rapidly rises. A traditional approach is to directly integrate lossy compression into collectives, which can lead serious performance issues such underutilized devices and uncontrolled data distortion. In order address these issues, in this paper, we propose gZCCL, first-ever general framework that designs optimizes GPU-aware, compression-enabled collectives with an accuracy-aware design control error propagation. To validate our framework, evaluate the on up 512 NVIDIA A100 GPUs real-world applications datasets. Experimental results demonstrate gZCCL-accelerated including both computation (Allreduce) movement (Scatter), outperform NCCL well Cray MPI by 4.5 × 28.7 ×, respectively. Furthermore, accuracy evaluation image-stacking application confirms high reconstructed quality of framework.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (31)

CITATIONS (6)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications OPENALEX - Publications

PlumX Metrics

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....