NFDI4DS | UHH-SEMS - Publication Details

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Predicate (mathematical logic) Scene graph

DOI: 10.48550/arxiv.2303.13233 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Qifan Yu

Juncheng Li

Yu Wu

Siliang Tang

Wei Ji

Yueting Zhuang

ABSTRACT

Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly train and hard distinguish due a small amount of annotated data compared frequent predicates. Existing re-balancing strategies try handle it via prior rules but confined pre-defined conditions, which not scalable various models datasets. In this paper, we propose Cross-modal prediCate boosting (CaCao) framework, where visually-prompted language model is learned generate diverse fine-grained predicates low-resource way. The proposed CaCao can be applied plug-and-play fashion automatically strengthen existing SGG tackle the long-tailed problem. Based that, further introduce novel Entangled cross-modal prompt approach open-world predicate scene graph generation (Epic), generalize unseen zero-shot manner. Comprehensive experiments three benchmark datasets show consistently boosts performance multiple model-agnostic Moreover, our Epic achieves competitive prediction. code paper publicly available.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....