Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition
DOI: 10.48550/arxiv.2404.16385 Publication Date: 2024-04-25
ABSTRACT
In the realm of Medical Visual Language Models (Med-VLMs), quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short training resources, yet largely unexplored. Given unique challenges medical domain, such as limited data scope and significant domain-specific requirements, evaluating adapting Parameter-Efficient Fine-Tuning (PEFT) methods specifically Med-VLMs is essential. Most current PEFT on have to be comprehensively investigated but mainly focus adding some components model's structure or input. However, intrinsic model yields better generality consistency, its impact ultimate performance has been widely overlooked understudied. this paper, we endeavour explore an alternative traditional methods, LayerNorm layers, FFNs Attention layers Med-VLMs. Our comprehensive studies span both small-scale large-scale Med-VLMs, their under various paradigms across tasks Question Answering Imaging Report Generation. The findings reveal insights into effects parameter downstream expose solely not only surpasses efficiency also retains accuracy generalization capabilities a spectrum tasks. experiments show fine-tuning's superior adaptability scalability, particularly context
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....