Explain Images with Multimodal Recurrent Neural Networks

Multimodal therapy
DOI: 10.48550/arxiv.1410.1090 Publication Date: 2014-01-01
ABSTRACT
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models probability distribution word given previous words and image. Image are generated by sampling from distribution. The consists two sub-networks: deep recurrent neural network sentences convolutional These sub-networks interact with each other in layer form whole m-RNN model. effectiveness our is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, 30K). Our outperforms state-of-the-art generative method. addition, can be applied retrieval tasks retrieving images or sentences, achieves significant performance improvement over methods which optimize ranking objective function retrieval.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....