NFDI4DS | UHH-SEMS - Publication Details

Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

FOS: Computer and information sciences Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2406.16641 Publication Date: 2024-06-24

Abstract Supplemental Material References Cited by

AUTHORS (5)

Fu Jun

Wei Zhou

Qiuping Jiang

Hantao Liu

Guangtao Zhai

ABSTRACT

Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal learning method only tunes the language branch of CLIP models. This is not enough for AI generated assessment (AGIQA) since AGIs visually differ from images. In addition, consistency between and user input text prompts, which correlates with perceptual AGIs, investigated guide AGIQA. this letter, we propose vision-language guided multi-modal blind AGIQA, dubbed CLIP-AGIQA. Specifically, introduce learnable visual prompts vision branches models, respectively. Moreover, design a text-to-image alignment prediction task, whose learned knowledge used optimization above prompts. Experimental results on two public AGIQA datasets demonstrate that proposed outperforms state-of-the-art The source code available at https://github.com/JunFu1995/CLIP-AGIQA.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....