CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Autoencoder
Feature (linguistics)
Action Recognition
Feature Learning
DOI:
10.48550/arxiv.2301.06018
Publication Date:
2023-01-01
AUTHORS (6)
ABSTRACT
Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition. This work shows that CMAE also trivially generalizes well on video action recognition without modifying the architecture and loss criterion. By directly replacing original pixel shift with temporal shift, our for recognition, CMAE-V short, can generate stronger than counterpart based pure masked autoencoders. Notably, CMAE-V, hybrid architecture, achieve 82.2% 71.6% top-1 accuracy Kinetics-400 Something-something V2 datasets, respectively. We hope this report could provide some informative inspiration future works.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....