NFDI4DS | UHH-SEMS - Publication Details

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

FOS: Computer and information sciences Computer Science - Machine Learning Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)

DOI: 10.1609/aaai.v36i2.20060 Publication Date: 2022-07-04T10:35:34Z

Abstract Supplemental Material References Cited by

AUTHORS (8)

Daizong Liu

Xiaoye Qu

Yinzhen Wang

Xing Di

Kai Zou

Yu Cheng

Zichuan Xu

Pan Zhou

ABSTRACT

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query. Though respectable works have made decent achievements in this task, they severely rely on abundant video-query paired data, which is expensive to collect in real-world scenarios. In this paper, we explore whether a video grounding model can be learned without any paired annotations. To the best of our knowledge, this paper is the first work trying to address TVG in an unsupervised setting. Considering there is no paired supervision, we propose a novel Deep Semantic Clustering Network (DSCNet) to leverage all semantic information from the whole query set to compose the possible activity in each video for grounding. Specifically, we first develop a language semantic mining module, which extracts implicit semantic features from the whole query set. Then, these language semantic features serve as the guidance to compose the activity in video via a video-based semantic aggregation module. Finally, we utilize a foreground attention branch to filter out the redundant background activities and refine the grounding results. To validate the effectiveness of our DSCNet, we conduct experiments on both ActivityNet Captions and Charades-STA datasets. The results demonstrate that our DSCNet achieves competitive performance, and even outperforms most weakly-supervised approaches.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (22)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....