Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification
Pyramid (geometry)
RGB color model
Feature (linguistics)
Representation
Optical Flow
DOI:
10.48550/arxiv.1903.02155
Publication Date:
2019-01-01
AUTHORS (5)
ABSTRACT
Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatio-temporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow model temporal information, which often expensive compute store. Second, has limited ability capture details local context information for data. Third, lacks explicit semantic guidance that greatly decrease the performance. In this paper, we proposed a new two-stream based deep framework discover spatial only from RGB frames, moreover, multi-scale pyramid attention (MPA) layer adversarial learning (SAL) module introduced integrated our framework. MPA enables network capturing global feature generate comprehensive representation video, SAL can make gradually approximate real semantics an manner. Experimental results two public benchmarks demonstrate methods achieves state-of-the-art standard datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....