NFDI4DS | UHH-SEMS - Publication Details

Explain Images with Multimodal Recurrent Neural Networks

Multimodal therapy

DOI: 10.48550/arxiv.1410.1090 Publication Date: 2014-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Junhua Mao

Wei Xu

Yi Yang

Jiang Wang

Alan Yuille

ABSTRACT

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models probability distribution word given previous words and image. Image are generated by sampling from distribution. The consists two sub-networks: deep recurrent neural network sentences convolutional These sub-networks interact with each other in layer form whole m-RNN model. effectiveness our is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, 30K). Our outperforms state-of-the-art generative method. addition, can be applied retrieval tasks retrieving images or sentences, achieves significant performance improvement over methods which optimize ranking objective function retrieval.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Explain Images with Multimodal Recurrent Neural Networks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....