NFDI4DS | UHH-SEMS - Publication Details

Improving Referring Image Segmentation using Vision-Aware Text Features

Feature (linguistics) Benchmark (surveying) Similarity (geometry) Segmentation-based object categorization

DOI: 10.48550/arxiv.2404.08590 Publication Date: 2024-04-12

Abstract Supplemental Material References Cited by

AUTHORS (6)

Hai Nguyen-Truong

E-Ro Nguyen

Tuan-Anh Vu

Minh–Triet Tran

Binh‐Son Hua

Sai-Kit Yeung

ABSTRACT

Referring image segmentation is a challenging task that involves generating pixel-wise masks based on natural language descriptions. Existing methods have relied mostly visual features to generate the while treating text as supporting components. This over-reliance can lead suboptimal results, especially in complex scenarios where prompts are ambiguous or context-dependent. To overcome these challenges, we present novel framework VATEX improve referring by enhancing object and context understanding with Vision-Aware Text Feature. Our method using CLIP derive Prior integrates an object-centric heatmap description, which be used initial query DETR-based architecture for task. Furthermore, observing there multiple ways describe instance image, enforce feature similarity between variations same input two components: Contextual Multimodal Decoder turns embeddings into vision-aware features, Meaning Consistency Constraint ensure further coherent consistent interpretation of expressions obtained from image. achieves significant performance improvement three benchmark datasets RefCOCO, RefCOCO+ G-Ref. Code available at: https://nero1342.github.io/VATEX\_RIS.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Improving Referring Image Segmentation using Vision-Aware Text Features

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....