NFDI4DS | UHH-SEMS - Publication Details

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

CLIPS Sliding window protocol

DOI: 10.1609/aaai.v33i01.33018393 Publication Date: 2019-08-21T07:41:20Z

Abstract Supplemental Material References Cited by

AUTHORS (6)

Dongliang He

Xiang Zhao

Jizhou Huang

Fu Li

Xiao Liu

Shilei Wen

ABSTRACT

The task of video grounding, which temporally localizes a natural language description in video, plays an important role understanding videos. Existing studies have adopted strategies sliding window over the entire or exhaustively ranking all possible clip-sentence pairs presegmented inevitably suffer from enumerated candidates. To alleviate this problem, we formulate as problem sequential decision making by learning agent regulates temporal grounding boundaries progressively based on its policy. Specifically, propose reinforcement framework improved multi-task and it shows steady performance gains considering additional supervised boundary information during training. Our proposed achieves state-of-the-art ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) Charades-STA (Sigurdsson 2016; Gao while observing only 10 less clips per video.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (107)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....