NFDI4DS | UHH-SEMS - Publication Details

Boosting Video Representation Learning with Multi-Faceted Integration

Representation Feature Learning Boosting Closed captioning

DOI: 10.48550/arxiv.2201.04023 Publication Date: 2022-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Zhaofan Qiu

Ting Yao

Chong‐Wah Ngo

Xiao–Ping Zhang

Dong Wu

Tao Mei

ABSTRACT

Video content is multifaceted, consisting of objects, scenes, interactions or actions. The existing datasets mostly label only one the facets for model training, resulting in video representation that biases to facet depending on training dataset. There no study yet how learn a from multifaceted labels, and whether information helpful learning. In this paper, we propose new learning framework, MUlti-Faceted Integration (MUFI), aggregate different could reflect full spectrum content. Technically, MUFI formulates problem as visual-semantic embedding learning, which explicitly maps into rich semantic space, jointly optimizes two perspectives. One capitalize intra-facet supervision between each its own descriptions, second predicts "semantic representation" other inter-facet supervision. Extensive experiments demonstrate 3D CNN via our framework union four large-scale plus image leads superior capability representation. pre-learnt with also shows clear improvements over approaches several downstream applications. More remarkably, achieves 98.1%/80.9% UCF101/HMDB51 action recognition 101.5% terms CIDEr-D score MSVD captioning.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Boosting Video Representation Learning with Multi-Faceted Integration

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....