Dual Learning for Cross-domain Image Captioning

Closed captioning
DOI: 10.1145/3132847.3132920 Publication Date: 2017-11-06T13:30:29Z
ABSTRACT
Recent AI research has witnessed increasing interests in automatically generating image descriptions text, which is coined as theimage captioning problem. Significant progresses have been made domains where plenty of labeled training data (i.e. image-text pairs) are readily available or collected. However, obtaining rich annotated a time-consuming and expensive process, creating substantial barrier for applying methods to new domain. In this paper, we propose cross-domain approach that uses novel dual learning mechanism overcome barrier. First, model the alignment between neural representations images natural languages source domain one can access sufficient data. Second, adjust pre-trained based on examining limited (or unpaired data) target particular, introduce with policy gradient method generates highly rewarded captions. The simultaneously optimizes two coupled objectives: text plausible from descriptions, hope by explicitly exploiting their relation, safeguard performance To verify effectiveness our model, use MSCOCO dataset other datasets (Oxford-102 Flickr30k) domains. experimental results show consistently outperforms previous captioning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (36)
CITATIONS (39)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....