NFDI4DS | UHH-SEMS - Publication Details

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia Multimedia (cs.MM)

DOI: 10.1609/aaai.v38i16.29789 Publication Date: 2024-03-25T11:53:29Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Hailang Huang

Zhijie Nie

Ziqiao Wang

Ziyu Shang

ABSTRACT

Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and intra-modal semantic loss problem. These problems can significantly affect accuracy of retrieval. To address these challenges, we propose a novel method called Cross-modal Uni-modal Soft-label Alignment (CUSA). Our leverages power uni-modal pre-trained models to provide soft-label supervision signals for model. Additionally, introduce alignment techniques, (CSA) (USA), overcome false negatives enhance similarity recognition between samples. is designed be plug-and-play, meaning it easily applied existing without changing their original architectures. Extensive experiments on various datasets, demonstrate that our consistently improve achieve new state-of-the-art results. Furthermore, also boost models, enabling universal The code supplementary files found at https://github.com/lerogo/aaai24_itr_cusa.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (9)

EXTERNAL LINKS

CROSSREF - Publications OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....