Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Negative True positive rate
DOI: 10.48550/arxiv.2404.11317 Publication Date: 2024-04-17
ABSTRACT
The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of reference image and modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive negative examples. However, triplet for CIR incurs high manual annotation costs, resulting in limited Furthermore, existing commonly use in-batch sampling, reduces number available model. To address problem lack positives, we propose data generation method by leveraging multi-modal large language model construct triplets CIR. introduce more negatives during fine-tuning, design two-stage fine-tuning framework CIR, whose second stage introduces plenty static representations optimize representation space rapidly. above two improvements can be effectively stacked designed plug-and-play, easily applied models without changing their original architectures. Extensive experiments ablation analysis demonstrate that our scales positives achieves state-of-the-art results on both FashionIQ CIRR datasets. In addition, also perform well zero-shot retrieval, providing new solution low-resources scenario.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....