Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
Closed captioning
Bridge (graph theory)
DOI:
10.1609/aaai.v34i07.6780
Publication Date:
2022-09-09T04:37:22Z
AUTHORS (6)
ABSTRACT
Visual storytelling is a task of creating short story based on photo streams. Unlike existing visual captioning, aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists small, fixed number photos per story. Therefore, main challenge fill in gap between with narrative imaginative In this paper, we propose explicitly learn imagine storyline that bridges gap. During training, one or more randomly omitted from input stack, train network produce full plausible even missing photo(s). Furthermore, for hide-and-tell model, which designed non-local relations across streams refine improve conventional RNN-based models. experiments, show our scheme hide-and-tell, design are indeed effective at storytelling, model outperforms previous state-of-the-art methods automatic metrics. Finally, qualitatively learned ability interpolate over gaps.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (17)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....