- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
Tsinghua University
2024
Video moment retrieval (VMR) is a cutting-edge vision-language task locating segment in video according to the query. Though methods have achieved significant performance, they assume that training and testing samples share same action types, hindering real-world application. In this paper, we specifically consider new problem: by queries with unseen actions. We propose plug-and-play structure, Routing Evidence (RE), multiple evidence-learning heads dynamically route one locate sentence an...
Video moment retrieval locates a specified by sentence query. Recent approaches have made remarkable advancements with large-scale video-sentence annotations. These annotations require extensive human labor and expertise, leading to the need for unsupervised fashion. Generating pseudo-supervision from videos is an effective strategy. With power of pre-trained model, we introduce knowledge into constructing pseudo-supervision. The main technical challenge improving diversity alleviating noise...