Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Predicate (mathematical logic)
Scene graph
DOI:
10.48550/arxiv.2303.13233
Publication Date:
2023-01-01
AUTHORS (6)
ABSTRACT
Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly train and hard distinguish due a small amount of annotated data compared frequent predicates. Existing re-balancing strategies try handle it via prior rules but confined pre-defined conditions, which not scalable various models datasets. In this paper, we propose Cross-modal prediCate boosting (CaCao) framework, where visually-prompted language model is learned generate diverse fine-grained predicates low-resource way. The proposed CaCao can be applied plug-and-play fashion automatically strengthen existing SGG tackle the long-tailed problem. Based that, further introduce novel Entangled cross-modal prompt approach open-world predicate scene graph generation (Epic), generalize unseen zero-shot manner. Comprehensive experiments three benchmark datasets show consistently boosts performance multiple model-agnostic Moreover, our Epic achieves competitive prediction. code paper publicly available.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....