PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification

Pooling Feature (linguistics) Contextual image classification Modality (human–computer interaction) Feature Learning
DOI: 10.48550/arxiv.2404.08915 Publication Date: 2024-04-13
ABSTRACT
Few-shot learning has been successfully applied to medical image classification as only very few examples are available for training. Due the challenging problem of limited number annotated images, representations should not be solely derived from a single modality which is insufficient characterizing concept classes. In this paper, we propose new prompting multi-modal model paradigm on based foundation models, called PM2. Besides modality,PM2 introduces another supplementary text input, known prompt, further describe corresponding or classes and facilitate few-shot across diverse modalities. To better explore potential prompt engineering, empirically investigate five distinct schemes under paradigm. Furthermore, linear probing in models acts head taking input class token, ignores completely merits rich statistics inherent high-level visual tokens. Thus, alternatively perform feature distribution tokens token simultaneously. effectively mine such statistics, global covariance pooling with efficient matrix power normalization used aggregate Then study combine two heads. One shared vision encoder representation encoded by encoder. The other Extensive experiments three datasets show that our PM2 significantly outperforms counterparts regardless achieves state-of-the-art performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....