NFDI4DS | UHH-SEMS - Publication Details

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)

DOI: 10.18653/v1/2024.emnlp-main.161 Publication Date: 2024-11-27T22:28:12Z

Abstract Supplemental Material References Cited by

AUTHORS (12)

Yufei Ma

Zihan Liang

Huangyu Dai

Ben Chen

Dehong Gao

Zhuoran Ran

Wang Zihan

Linbo Jin

Wen Jiang

Guannan Zhang

Xiaoyan Cai

Libin Yang

ABSTRACT

The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....