RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Autoencoder
Code (set theory)
Deep Neural Networks
DOI:
10.48550/arxiv.2309.01005
Publication Date:
2023-01-01
AUTHORS (3)
ABSTRACT
Masked image modeling (MIM) has become a prevalent pre-training setup for vision foundation models and attains promising performance. Despite its success, existing MIM methods discard the decoder network during downstream applications, resulting in inconsistent representations between fine-tuning can hamper task In this paper, we propose new architecture, RevColV2, which tackles issue by keeping entire autoencoder architecture both fine-tuning. The main body of RevColV2 contains bottom-up columns top-down columns, information is reversibly propagated gradually disentangled. Such design enables our with nice property: maintaining disentangled low-level semantic at end pre-training. Our experimental results suggest that model decoupled features achieve competitive performance across multiple tasks such as classification, segmentation object detection. For example, after intermediate on ImageNet-22K dataset, RevColV2-L 88.4% top-1 accuracy ImageNet-1K classification 58.6 mIoU ADE20K segmentation. With extra teacher large scale RevColv2-L achieves 62.1 box AP COCO detection 60.4 Code are released https://github.com/megvii-research/RevCol
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....