NFDI4DS | UHH-SEMS - Publication Details

Language-guided Residual Graph Attention Network and Data Augmentation for Visual Grounding

Scene graph

DOI: 10.1145/3604557 Publication Date: 2023-06-14T11:26:25Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Jia Wang

Hong-Han Shuai

Yung-Hui Li

Wen-Huang Cheng

ABSTRACT

Visual grounding is an essential task in understanding the semantic relationship between given text description and target object image. Due to innate complexity of language rich context image, it still a challenging problem infer underlying perform reasoning objects image expression. Although existing visual methods have achieved promising progress, cross-modal mapping across different domains for not well handled, especially when expressions are complex long. To address issue, we propose language-guided residual graph attention network (LRGAT-VG), which enables us apply deeper convolution layers with assistance connections them. This allows better handle long than other graph-based methods. Furthermore, Language-guided Data Augmentation (LGDA), based on copy-paste operations pairs source images increase diversity training data while maintaining With extensive experiments three benchmarks, including RefCOCO, RefCOCO+, RefCOCOg, LRGAT-VG LGDA achieves competitive performance state-of-the-art network-based referring expression approaches demonstrates its effectiveness.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (85)

CITATIONS (6)

EXTERNAL LINKS

CROSSREF - Publications OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Language-guided Residual Graph Attention Network and Data Augmentation for Visual Grounding

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....