Shuffle-invariant Network for Action Recognition in Videos

Feature (linguistics) Feature Learning
DOI: 10.1145/3485665 Publication Date: 2022-03-04T10:26:32Z
ABSTRACT
The local key features in video are important for improving the accuracy of human action recognition. However, most end-to-end methods focus on global feature learning from videos, while few works consider enhancement information a feature. In this article, we discuss how to automatically enhance ability discriminate an and improve To address these problems, assume that critical level each region recognition task is different will not change with location shuffle. We therefore propose novel method called shuffle-invariant network. proposed method, shuffled generated by regular cutting random confusion input data. network adopts multitask framework, which includes one backbone three branches: learning, adversarial classification features, response predicted train network, L 1-based loss defined ensure ordered list regions remains unchanged after Then, applied eliminate noise caused Finally, combines two tasks jointly guide training obtain more effective features. testing phase, only identify category video. verify HMDB51 UCF101 datasets. Several ablation experiments constructed effectiveness module. experimental results show our approach achieves state-of-the-art performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (42)
CITATIONS (14)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....