MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
FOS: Computer and information sciences
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
DOI:
10.1609/aaai.v38i14.29540
Publication Date:
2024-03-25T11:27:46Z
AUTHORS (5)
ABSTRACT
Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists shared backbone and task-specific decoders. However, complexity decoders increases with number To tackle this challenge, we integrate decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored CLIP for adapting downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these solely fine-tune single modality (text or visual), disrupting CLIP. In paper, first propose Multi-modal Alignment Prompt (MmAP) aligns text visual modalities during fine-tuning process. Building upon MmAP, develop an innovative framework. On one hand, maximize complementarity high similarity, utilize gradient-driven task grouping method that partitions into several disjoint groups assign group-shared MmAP each group. other preserve unique characteristics task, task. Comprehensive experiments on two large datasets demonstrate our achieves significant improvements compared full while only utilizing approximately ~ 0.09% trainable parameters.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (22)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....