NFDI4DS | UHH-SEMS - Publication Details

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Boosting

DOI: 10.48550/arxiv.2406.04594 Publication Date: 2024-06-06

Abstract Supplemental Material References Cited by

AUTHORS (25)

Jianbo Dong

Bin Luo

Jun Zhang

Pengcheng Zhang

Fei Feng

Yikai Zhu

Ang Liu

Zian Chen

Yi Shi

Hairong Jiao

Gang Lü

Yu Guan

Ennan Zhai

Wencong Xiao

Hanyu Zhao

Man Yuan

Siran Yang

Xiang Li

Jiamang Wang

Rui Men

Jianwei Zhang

Zhong Huang

Dennis Cai

Yuan Xie

Binzhang Fu

ABSTRACT

The emergence of Large Language Models (LLMs) has necessitated the adoption parallel training techniques, involving deployment thousands GPUs to train a single model. Unfortunately, we have found that efficiency current is often suboptimal, largely due following two main issues. Firstly, hardware failures are inevitable, leading interruptions in tasks. inability quickly identify faulty components results substantial waste GPU resources. Secondly, since must wait for parameter synchronization complete before proceeding next round computation, network congestions can greatly increase waiting time GPUs. To address these challenges, this paper introduces communication-driven solution, namely C4. key insights C4 folds. First, training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies certainly some form malfunction. By leveraging feature, rapidly components, swiftly isolate anomaly, restart task, thereby avoiding resource wastage caused by delays anomaly detection. Second, predictable model communication, few large flows, allows efficiently execute traffic planning, substantially reducing congestion. been extensively implemented across our production systems, cutting error-induced overhead roughly 30% enhancing runtime performance about 15% certain applications with moderate costs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....