NFDI4DS | UHH-SEMS - Publication Details

Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective

Paradigm shift

DOI: 10.48550/arxiv.2502.01524 Publication Date: 2025-02-03

Abstract Supplemental Material References Cited by

AUTHORS (3)

Xiaorui Ma

Haoran Xie

S. Joe Qin

ABSTRACT

The integration of vision-language modalities has been a significant focus in multimodal learning, traditionally relying on Vision-Language Pretrained Models. However, with the advent Large Language Models (LLMs), there notable shift towards incorporating LLMs vision modalities. Following this, training paradigms for into have evolved. Initially, approach was to integrate through pretraining modality integrator, named Single-stage Tuning. It since branched out methods focusing performance enhancement, denoted as Two-stage Tuning, and those prioritizing parameter efficiency, referred Direct Adaptation. existing surveys primarily address latest Vision (VLLMs) leaving gap understanding evolution their unique parameter-efficient considerations. This paper categorizes reviews 34 VLLMs from top conferences, journals, highly cited Arxiv papers, efficiency during adaptation paradigm perspective. We first introduce architecture learning methods, followed by discussion encoders comprehensive taxonomy integrators. then review three considerations, summarizing benchmarks VLLM field. To gain deeper insights effectiveness we compare discuss experimental results representative models, among which experiment Adaptation is replicated. Providing recent developments practical uses, this survey vital guide researchers practitioners navigating efficient LLMs.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....