- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Recommender Systems and Techniques
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
- Advanced Graph Neural Networks
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Caching and Content Delivery
- Domain Adaptation and Few-Shot Learning
- Human Mobility and Location-Based Analysis
Shandong University
2019-2020
Personalized hashtag recommendation methods aim to suggest users hashtags annotate, categorize, and describe their posts. The hashtags, that a user provides post (e.g., micro-video), are the ones which in her mind can well content where she is interested in. It means we should consider both users' preferences on contents personal understanding hashtags. Most existing rely modeling either interactions between posts or for recommendation. These have not explored complicated among users,...
With rising awareness of environment protection and recycling, second-hand trading platforms have attracted increasing attention in recent years. The interaction data on platforms, consisting sufficient interactions per user but rare item, is different from what they are traditional platforms. Therefore, building successful recommendation systems the requires balancing modeling items? users? preference, mitigating adverse effects sparsity, which makes especially challenging. Accordingly, we...
In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, search. Text-video aims to rank relevant text/video higher than irrelevant ones. The core this task is precisely measure cross-modal similarity between texts videos. Recently, contrastive learning methods have shown promising results retrieval, most which focus on construction positive negative pairs learn text representations. Nevertheless, they do...
With the explosive growth of video data in real-world applications, a comprehensive representation videos becomes increasingly important. In this paper, we address problem scene recognition, whose goal is to learn high-level classify scenes videos. Due diversity and complexity contents realistic scenarios, task remains challenge. Most existing works identify for only from visual or textual information temporal perspective, ignoring valuable hidden single frames, while several earlier studies...
We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP. Upon popular image-text models like CLIP, most current adaptation-based pre-training methods are confronted by three major issues, i.e., noisy data corpus, time-consuming pre-training, limited performance gain. Towards this end, we conduct comprehensive study including four critical steps in pre-training. Specifically, investigate 1)...
The user base of short video apps has experienced unprecedented growth in recent years, resulting a significant demand for content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from vast corpus, is an essential function, primary challenge bridge modality gap. Nevertheless, most existing approaches treat texts merely as discrete tokens and neglect their syntax structures. Moreover, abundant spatial temporal clues are often...
Personalized hashtag recommendation methods aim to suggest users hashtags annotate, categorize, and describe their posts. The hashtags, that a user provides post (e.g., micro-video), are the ones which in her mind can well content where she is interested in. It means we should consider both users' preferences on contents personal understanding hashtags. Most existing rely modeling either interactions between posts or for recommendation. These have not explored complicated among users,...