Localize, Assemble, and Predicate: Contextual Object Proposal Embedding for Visual Relation Detection
Predicate (mathematical logic)
DOI:
10.1609/aaai.v34i07.6913
Publication Date:
2020-06-29T18:27:45Z
AUTHORS (5)
ABSTRACT
Visual relation detection (VRD) aims to describe all interacting objects in an image using subject-predicate-object triplets. Critically, valid relations combinatorially grow O(C2 R) for C object categories and R relationships. The frequencies of triplets exhibit a long-tailed distribution, which inevitably leads bias towards popular visual the learned VRD model. To address this problem, we propose localize-assemble-predicate network (LAP-Net), decomposes into three sub-tasks: localizing individual objects, assembling predicting subject-object pairs. In first stage LAP-Net, Region Proposal Network (RPN) is used generate few class-agnostic proposals. Next, these proposals are assembled form pairs via second Pair (PPN), novel contextual embedding scheme. inner product between embedded representations faithfully reflects compatibility pair proposals, without estimating subject class. Top-ranked from two fed third sub-network, precisely estimates relationship. whole pipeline except last object-category-agnostic relationships image, alleviating induced by training data. Our LAP-Net can be trained end-to-end fashion. We demonstrate that achieves state-of-the-art performance on benchmark while maintaining high speed inference.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....