Learning continuous temporal embedding of videos using pattern theory
Discriminative model
DOI:
10.1016/j.patrec.2021.02.025
Publication Date:
2021-03-18T07:32:52Z
AUTHORS (5)
ABSTRACT
Abstract Visual Question Answering (VQA) is a challenging task in artificial intelligence and has received increasing attention from both the computer vision and the natural language processing communities. Joint embedding learning for it suffers from the background noise of images and text. Learning continuous temporal embedding is a potential solution to extract both visual elements and textual elements with a proper temporal structure. In this paper, we address a continuous temporal embedding model based on pattern theory (CTE-PT) and fully express the atomism and combinatory based on Grenander's pattern theory. First, we generate atomic actions from videos and note them as generators, which reflects the atomism in pattern theory. Second, we design a CTE-PT model to discover the discriminative configuration of videos, which reflects the combinatory of atomic actions in pattern theory. In the CTE-PT model, we design the configuration proposal module to remove some background information initially, and the configuration interpretation module to minimize the interpretive energy of continuous temporal embedding. We estimate the energy by each pair in this embedding sequence and optimize it based on the Viterbi algorithm. The experimental results show that our CTE-PT model outperforms the baseline C3D+LSTM model on Olympic Sports, UCF 101, and HMDB51 datasets, which proves the effectiveness of mining the common continuous temporal embedding as the class-specific configuration for activity discriminator.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (38)
CITATIONS (5)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....