NFDI4DS | UHH-SEMS - Publication Details

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation

Sequence (biology)

DOI: 10.48550/arxiv.2411.15419 Publication Date: 2024-11-22

Abstract Supplemental Material References Cited by

AUTHORS (5)

Fahao Chen

Peng Li

Zicong Hong

Zhou Su

Song Guo

ABSTRACT

Mixture-of-Experts (MoE) is an emerging technique for scaling large models with sparse activation. MoE are typically trained in a distributed manner expert parallelism scheme, where experts each layer across multiple GPUs. However, the default suffers from heavy network burden due to all-to-all intermediate data exchange among GPUs before and after run. Some existing works have proposed reduce exchanges by transferring loads, however, which would decrease level of execution make computation inefficient. The weaknesses motivate us explore whether it possible inter-GPU traffic while maintaining high degree parallelism. This paper gives positive response presenting Luffy, communication-efficient training system two new techniques. First, Luffy migrates sequences hide token pulling paths within avoid copying over Second, we propose condensation that identifies similar tokens then eliminates redundant transmissions. We implement based on PyTorch evaluate its performance testbed 16 V100 can achieve speedup up 2.73x compared state-of-the-art systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....