NFDI4DS | UHH-SEMS - Publication Details

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition

DOI: 10.48550/arxiv.2404.04933 Publication Date: 2024-04-07

Abstract Supplemental Material References Cited by

AUTHORS (4)

Yingsen Zeng

Yujie Zhong

Chengjian Feng

Lin Ma

ABSTRACT

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos. Despite that they focus different events, we observe have a significant connection. For instance, most descriptions in MR involve multiple actions from TAD. In this paper, aim investigate potential synergy between TAD and MR. Firstly, propose unified architecture, termed Unified (UniMD), for both It transforms inputs of two tasks, namely or MR, into common embedding space, utilizes novel query-dependent decoders generate uniform output classification score temporal segments. Secondly, explore efficacy task fusion learning approaches, pre-training co-training, order enhance mutual benefits Extensive experiments demonstrate proposed scheme enables tasks help each other outperform separately trained counterparts. Impressively, UniMD achieves state-of-the-art results three paired datasets Ego4D, Charades-STA, ActivityNet. Our code will be released at https://github.com/yingsen1/UniMD.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....