NFDI4DS | UHH-SEMS - Publication Details

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Variance reduction Leverage (statistics) Benchmark (surveying)

DOI: 10.48550/arxiv.2404.08080 Publication Date: 2024-04-11

Abstract Supplemental Material References Cited by

AUTHORS (5)

Tanmay Gautam

Youngsuk Park

Hao Zhou

Parameswaran Raman

Wooseok Ha

ABSTRACT

Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation ZO-SGD, been shown consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO variance reduction techniques enhance stability convergence inference-based LM fine-tuning. We introduce Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG) demonstrate its efficacy across multiple fine-tuning tasks, eliminating reliance on task-specific Evaluated range both masked autoregressive benchmark GLUE MeZO-SVRG outperforms MeZO up 20% increase test accuracies full- partial-parameter settings. benefits from reduced computation time it often surpasses MeZO's peak accuracy $2\times$ GPU-hours. significantly reduces required footprint compared first-order SGD, i.e. by models. Our experiments highlight that MeZO-SVRG's savings progressively improve SGD larger batch sizes.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....