Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

Interpretability Bounding overwatch Minimum bounding box
DOI: 10.48550/arxiv.1803.11209 Publication Date: 2018-01-01
ABSTRACT
Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the as selection from a set of bounding box proposals obtained deep net based systems. In this work, we demonstrate that can cast problem textual into unified framework permits efficient search over all possible boxes. Hence, method able to consider significantly more doesn't rely on successful first stage hypothesizing proposals. Beyond, trained parameters our model be used word-embeddings which capture spatial-image relationships provide interpretability. Lastly, at time submission, approach outperformed current state-of-the-art methods Flickr 30k Entities ReferItGame dataset by 3.08% 7.77% respectively.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....