NFDI4DS | UHH-SEMS - Publication Details

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

Autoencoder Code (set theory) Deep Neural Networks

DOI: 10.48550/arxiv.2309.01005 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Qi Han

Yuxuan Cai

Xiang‐Yu Zhang

ABSTRACT

Masked image modeling (MIM) has become a prevalent pre-training setup for vision foundation models and attains promising performance. Despite its success, existing MIM methods discard the decoder network during downstream applications, resulting in inconsistent representations between fine-tuning can hamper task In this paper, we propose new architecture, RevColV2, which tackles issue by keeping entire autoencoder architecture both fine-tuning. The main body of RevColV2 contains bottom-up columns top-down columns, information is reversibly propagated gradually disentangled. Such design enables our with nice property: maintaining disentangled low-level semantic at end pre-training. Our experimental results suggest that model decoupled features achieve competitive performance across multiple tasks such as classification, segmentation object detection. For example, after intermediate on ImageNet-22K dataset, RevColV2-L 88.4% top-1 accuracy ImageNet-1K classification 58.6 mIoU ADE20K segmentation. With extra teacher large scale RevColv2-L achieves 62.1 box AP COCO detection 60.4 Code are released https://github.com/megvii-research/RevCol

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....