Segment Everything Everywhere All at Once
Market Segmentation
Segmentation-based object categorization
Text segmentation
DOI:
10.48550/arxiv.2304.06718
Publication Date:
2023-01-01
AUTHORS (7)
ABSTRACT
In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown Fig.1. propose novel decoding mechanism that enables diverse prompting types of segmentation tasks, aiming universal interface behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce new visual prompt to unify different spatial queries including points, boxes, scribbles masks, which can further generalize referring image; ii) Compositionality. learn joint visual-semantic space between text prompts, facilitates the dynamic composition two required various tasks; iii) Interactivity. incorporate learnable memory prompts into decoder retain history through mask-guided cross-attention from image features; iv) Semantic-awareness. use encoder encode mask labels same semantic open-vocabulary segmentation. conduct comprehensive empirical study validate effectiveness across tasks. Notably, our single achieves competitive performance segmentation, generic video object on 9 datasets minimum 1/100 supervision. Furthermore, showcases remarkable capacity generalization or their combinations, rendering it readily interface.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....