CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition
DOI: 10.48550/arxiv.2403.19137 Publication Date: 2024-03-28
ABSTRACT
Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Recently, pre-trained vision-language models such as CLIP, with powerful generalization ability, have gaining traction practical CL candidates. However, the domain mismatch between pre-training and downstream tasks calls for finetuning of CLIP on latter. The deterministic nature existing methods makes them overlook many possible interactions across modalities deems unsafe high-risk requiring reliable uncertainty estimation. To address these, our work proposes LeArning Probabilistic (CLAP). CLAP develops probabilistic modeling over task-specific modules visual-guided text features, providing more fine-tuning in CL. It further alleviates forgetting by exploiting rich weight initialization distribution regularization modules. Cooperating diverse range prompting methods, can surpass predominant approaches CLIP. Lastly, we study superior estimation abilities novel data detection exemplar selection within setups. Our code is available at \url{https://github.com/srvCodes/clap4clip}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....