Overcoming Language Priors in VQA via Decomposed Linguistic Representations

Representation
DOI: 10.1609/aaai.v34i07.6776 Publication Date: 2020-06-29T18:38:54Z
ABSTRACT
Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of attention-based VQA that learns decomposed linguistic representations utilizes the to infer answers for overcoming priors. We introduce modular attention mechanism parse question into three phrase representations: type representation, object concept representation. use representation identify possible answer set (yes/no or specific concepts such as colors numbers), focus relevant region an image. The is verified with attended final answer. proposed decouples language-based discovery vision-based verification in process inference prevent from dominating answering process. Experiments VQA-CP dataset demonstrate effectiveness our method.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (54)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....