NFDI4DS | UHH-SEMS - Publication Details

ZeRO-Offload: Democratizing Billion-Scale Model Training

Speedup

DOI: 10.48550/arxiv.2101.06840 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (8)

Jie Ren

Samyam Rajbhandari

Reza Yazdani Amin...

Olatunji Ruwase

Shuangyan Yang

Minjia Zhang

Dong Li

Yuxiong He

ABSTRACT

Large-scale model training has been a playing ground for limited few requiring complex refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large landscape by making accessible nearly everyone. It can train models with over 13 billion parameters on single GPU, 10x increase in size compared popular framework such as PyTorch, it does so without any change from data scientists or sacrificing computational efficiency. enables offloading compute CPU. To preserve efficiency, is designed minimize movement to/from reduce CPU time while maximizing memory savings GPU. As result, achieve 40 TFlops/GPU NVIDIA V100 10B parameter 30TF using PyTorch alone 1.4B model, largest that be trained running out of memory. also scale multiple-GPUs when available, offering near linear speedup up 128 GPUs. Additionally, work together parallelism 70 DGX-2 box, 4.5x alone. By combining efficiency ease-of-use, democratizes large-scale even just

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

ZeRO-Offload: Democratizing Billion-Scale Model Training

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....