DoRA: Weight-Decomposed Low-Rank Adaptation
Rank (graph theory)
DOI:
10.48550/arxiv.2402.09353
Publication Date:
2024-02-14
AUTHORS (7)
ABSTRACT
Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate inherent differences FT LoRA. Aiming resemble learning capacity from findings, propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes pre-trained into two components, magnitude direction, for fine-tuning, specifically employing directional updates efficiently minimize number trainable parameters. By DoRA, enhance both training stability while any overhead. consistently outperforms on LLaMA, LLaVA, VL-BART various downstream tasks, such as commonsense reasoning, visual instruction tuning, image/video-text understanding.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....