NFDI4DS | UHH-SEMS - Publication Details

Zhengxin Pan

ORCID: 0000-0003-2003-0728

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5005961599

Research Areas

Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Text and Document Classification Technologies

Zhejiang University of Science and Technology
2022-2024

Zhejiang University
2023

Ningbo University of Technology
2022

Fine-grained Image-text Matching by Cross-modal Hard Aligning Network

OPENALEX - Publications

Zhengxin Pan Fangyu Wu Bailing Zhang

Current state-of-the-art image-text matching methods implicitly align the visual-semantic fragments, like regions in images and words sentences, adopt cross-attention mechanism to discover fine-grained cross-modal semantic correspondence. However, may bring redundant or irrelevant region-word alignments, degenerating retrieval accuracy limiting efficiency. Although many researchers have made progress mining meaningful alignments thus improving accuracy, problem of poor efficiency remains...

10.1109/cvpr52729.2023.01847 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Multi-modal Contextual Prompt Learning for Multi-label Classification with Partial Labels

OPENALEX - Publications

Rui Wang Zhengxin Pan Fangyu Wu Yifan Lv Bailing Zhang

Multi-label classification is a task with diverse applications, but current algorithms heavily rely on accurately labeled data, leading to time-consuming and labor-intensive data collection. However, multi-label partial labels presents significant challenges. In this study, we propose Multi-modal Contextual Prompt Learning (MCPL), novel approach that leverages large-scale visual-language models exploits the strong image-text alignment in CLIP address scarcity of label annotations. We...

10.1145/3651671.3651674 article EN 2024-02-02

Kernel triplet loss for image‐text retrieval

OPENALEX - Publications

Zhengxin Pan Fangyu Wu Bailing Zhang

Abstract Triplet loss is widely used as the objective function in image‐text retrieval tasks. However, all triplets are treated equally, triplet has a bottleneck problem of slow convergence and other unsatisfactory performances. In this article, we propose solutions by appropriately weighting according to relative similarities among training samples. Specifically, present three functions assign an appropriate weight for selected informative accelerate convergence. We evaluate our approach on...

10.1002/cav.2093 article EN Computer Animation and Virtual Worlds 2022-06-01

Coming Soon ...