NFDI4DS | UHH-SEMS - Publication Details

Localizing Moments in Long Video Via Multimodal Guidance

FOS: Computer and information sciences Computer Science - Machine Learning 0209 industrial biotechnology Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2302.13372 Publication Date: 2023-10-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Barrios, Wayner

Soldan, Mattia

Ceballos-Arroyo, ...

Heilbron, Fabian ...

Ghanem, Bernard

ABSTRACT

The recent introduction of the large-scale, long-form MAD and Ego4D datasets has enabled researchers to investigate the performance of current state-of-the-art methods for video grounding in the long-form setup, with interesting findings: current grounding methods alone fail at tackling this challenging task and setup due to their inability to process long video sequences. In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows. We design a guided grounding framework consisting of a Guidance Model and a base grounding model. The Guidance Model emphasizes describable windows, while the base grounding model analyzes short temporal windows to determine which segments accurately match a given language query. We offer two designs for the Guidance Model: Query-Agnostic and Query-Dependent, which balance efficiency and accuracy. Experiments demonstrate that our proposed method outperforms state-of-the-art models by 4.1% in MAD and 4.52% in Ego4D (NLQ), respectively. Code, data and MAD's audio features necessary to reproduce our experiments are available at: https://github.com/waybarrios/guidance-based-video-grounding.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Localizing Moments in Long Video Via Multimodal Guidance

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....