CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Similarity (geometry)
DOI: 10.48550/arxiv.2303.11797 Publication Date: 2023-01-01
ABSTRACT
Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range text descriptions. In this work, we introduce novel cost-based approach to adapt vision-language foundation models, notably CLIP, for intricate task segmentation. Through aggregating cosine similarity score, i.e., cost volume between and embeddings, our method potently adapts CLIP segmenting seen unseen classes by fine-tuning its encoders, addressing challenges faced existing methods in handling classes. Building upon this, explore effectively aggregate considering multi-modal nature being established embeddings. Furthermore, examine various efficiently CLIP.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....