NFDI4DS | UHH-SEMS - Publication Details

Visual In-Context Learning for Large Vision-Language Models

DOI: 10.48550/arxiv.2402.11574 Publication Date: 2024-02-18

Abstract Supplemental Material References Cited by

AUTHORS (4)

Yucheng Zhou

Xiang Li

Qianning Wang

Jianbing Shen

ABSTRACT

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel (VICL) method comprising Demonstration Retrieval, Intent-Oriented Image Summarization, Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises with task intent task-specific visual parsing, composes language-based demonstrations that reduce token count alleviate interaction problem. Experimental evaluations on five reasoning datasets demonstrate effectiveness our method. Moreover, extensive experiments leverage information flow analysis to elucidate method, investigate impact length position for LVLM. The use in-context unlearning further shows promise resetting specific model knowledge without retraining.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Visual In-Context Learning for Large Vision-Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....