ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
Pascal (unit)
DOI:
10.48550/arxiv.2302.10307
Publication Date:
2023-01-01
AUTHORS (8)
ABSTRACT
Recently, great success has been made in learning visual representations from text supervision, facilitating the emergence of text-supervised semantic segmentation. However, existing works focus on pixel grouping and cross-modal alignment, while ignoring correspondence among multiple augmented views same image. To overcome such limitation, we propose multi-\textbf{View} \textbf{Co}nsistent (ViewCo) for Specifically, first text-to-views consistency modeling to learn input Additionally, cross-view segmentation address ambiguity issue supervision by contrasting segment features Siamese encoders. The benefits dense assignment encouraging different crops align with text, provides additional self-supervision, overcoming limitation ambiguous masks. Trained large-scale image-text data, our model can directly objects arbitrary categories a zero-shot manner. Extensive experiments show that ViewCo outperforms state-of-the-art methods average up 2.9\%, 1.6\%, 2.4\% mIoU PASCAL VOC2012, Context, COCO, respectively.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....