gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Lossy compression
DOI: 10.1145/3650200.3656636 Publication Date: 2024-06-03T18:11:54Z
ABSTRACT
GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU power rapidly rises. A traditional approach is to directly integrate lossy compression into collectives, which can lead serious performance issues such underutilized devices and uncontrolled data distortion. In order address these issues, in this paper, we propose gZCCL, first-ever general framework that designs optimizes GPU-aware, compression-enabled collectives with an accuracy-aware design control error propagation. To validate our framework, evaluate the on up 512 NVIDIA A100 GPUs real-world applications datasets. Experimental results demonstrate gZCCL-accelerated including both computation (Allreduce) movement (Scatter), outperform NCCL well Cray MPI by 4.5 × 28.7 ×, respectively. Furthermore, accuracy evaluation image-stacking application confirms high reconstructed quality of framework.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (31)
CITATIONS (6)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....