- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Text and Document Classification Technologies
Zhejiang University of Science and Technology
2022-2024
Zhejiang University
2023
Ningbo University of Technology
2022
Current state-of-the-art image-text matching methods implicitly align the visual-semantic fragments, like regions in images and words sentences, adopt cross-attention mechanism to discover fine-grained cross-modal semantic correspondence. However, may bring redundant or irrelevant region-word alignments, degenerating retrieval accuracy limiting efficiency. Although many researchers have made progress mining meaningful alignments thus improving accuracy, problem of poor efficiency remains...
Multi-label classification is a task with diverse applications, but current algorithms heavily rely on accurately labeled data, leading to time-consuming and labor-intensive data collection. However, multi-label partial labels presents significant challenges. In this study, we propose Multi-modal Contextual Prompt Learning (MCPL), novel approach that leverages large-scale visual-language models exploits the strong image-text alignment in CLIP address scarcity of label annotations. We...
Abstract Triplet loss is widely used as the objective function in image‐text retrieval tasks. However, all triplets are treated equally, triplet has a bottleneck problem of slow convergence and other unsatisfactory performances. In this article, we propose solutions by appropriately weighting according to relative similarities among training samples. Specifically, present three functions assign an appropriate weight for selected informative accelerate convergence. We evaluate our approach on...