NFDI4DS | UHH-SEMS - Publication Details

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.1707.05612 Publication Date: 2017-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Faghri, Fartash

Fleet, David J.

Kiros, Jamie Ryan

Fidler, Sanja

ABSTRACT

Accepted as spotlight presentation at British Machine Vision Conference (BMVC) 2018. Code: https://github.com/fartashf/vsepp<br/>We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performance. We showcase our approach, VSE++, on MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% in caption retrieval and 11.3% in image retrieval (at R@1).<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....