NFDI4DS | UHH-SEMS - Publication Details

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Granularity Feature Learning Leverage (statistics) Feature (linguistics) Representation

DOI: 10.48550/arxiv.2401.00701 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Kaibin Tian

Yanhua Cheng

Yi Liu

Xinglin Hou

Quan Chen

Li Han

ABSTRACT

In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut visual and textual cues achieve alignment. Concretely, those with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless prohibitive computation complexity. Nevertheless, these approaches are not optimal in terms feature utilization efficiency. To address this issue, we adopt multi-granularity learning, ensuring model's comprehensiveness capturing content features spanning from abstract detailed levels during training phase. better leverage features, devise two-stage architecture This solution ingeniously balances coarse fine granularity content. Moreover, it also strikes harmonious equilibrium between effectiveness Specifically, phase, parameter-free text-gated interaction (TIB) fine-grained video representation learning embed an extra Pearson Constraint optimize cross-modal learning. use coarse-grained representations fast recall top-k candidates, which then reranked by representations. Extensive experiments four benchmarks demonstrate efficiency effectiveness. Notably, our method achieves comparable current state-of-the-art while being nearly 50 times faster.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....