NFDI4DS | UHH-SEMS - Publication Details

Gangjian Zhang

ORCID: 0000-0003-1503-4513

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5021098663

Research Areas

Multimodal Machine Learning Applications
Image Retrieval and Classification Techniques
Advanced Image and Video Retrieval Techniques
Human Pose and Action Recognition
Advanced Vision and Imaging
Rough Sets and Fuzzy Logic
Anatomy and Medical Technology
3D Shape Modeling and Analysis
Data Mining Algorithms and Applications
Domain Adaptation and Few-Shot Learning

Hong Kong University of Science and Technology
2025

University of Hong Kong
2025

Beijing Jiaotong University
2021-2024

Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding

OPENALEX - Publications

Jinchang Ren Gangjian Zhang Haifeng Sun Hao Wang

Temporal sentence grounding in videos (TSGV) faces challenges due to public TSGV datasets containing significant temporal biases, which are attributed the uneven distributions of target moments. Existing methods generate augmented videos, where moments forced have varying locations. However, since video lengths given small variations, only changing locations results poor generalization ability with lengths. In this paper, we propose a novel training framework complemented by diversified data...

10.48550/arxiv.2501.06746 preprint EN arXiv (Cornell University) 2025-01-12

Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding

OPENALEX - Publications

Jinchang Ren Gangjian Zhang Haifeng Sun Hao Wang

10.1109/icassp49660.2025.10888960 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

OPENALEX - Publications

Yan Liu Feng Hou Yunjie Peng Gangjian Zhang Yao Zhang and 7 more

Recent advances in vision-language pre-training have significantly enhanced the model capabilities on grounded object detection. However, these studies often pre-train with coarse-grained text prompts, such as plain category names and brief phrases. This limitation curtails model's capacity for fine-grained linguistic comprehension leads to a significant decline performance when faced detailed descriptions or contextual information. To tackle problems, we develop DoGA: Detect objects Grouped...

10.1609/aaai.v39i6.32603 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval

OPENALEX - Publications

Gangjian Zhang Shikui Wei Huaxin Pang Yao Zhao

Composed image retrieval aims at performing task by giving a reference and complementary text piece. Since composing both information can accurately model the users' search intent, composed perform target-specific be potentially applied to many scenarios such as interactive product search. However, two key challenging issues must addressed in occasion. One of them is how fuse heterogeneous piece query into feature space. The other bridge gap between pieces images database. To address issues,...

10.1145/3474085.3475659 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Multimodal Composition Example Mining for Composed Query Image Retrieval

OPENALEX - Publications

Gangjian Zhang Shikun Li Shikui Wei Shiming Ge Na Cai and 1 more

Composed query image retrieval task aims to retrieve the target in database by a that composes two different modalities: reference and sentence declaring some details of need be modified replaced new elements. Tackling this needs learn multimodal embedding space, which can make semantically similar targets queries close but dissimilar as far away possible. Most existing methods start from perspective model structure design clever interactive modules promote better fusion modalities. However,...

10.1109/tip.2024.3359062 article EN IEEE Transactions on Image Processing 2024-01-01

Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment

OPENALEX - Publications

Gangjian Zhang Shikui Wei Huaxin Pang Shuang Qiu Yao Zhao

Composed image retrieval aims at retrieving the desired images, given a reference and text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of against piece, other replenish in Nowadays, existing methods neglect distinguish between implicitly put them together solve composed task. explicitly orderly model we propose novel method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS),...

10.1109/tip.2022.3204213 article EN IEEE Transactions on Image Processing 2022-01-01

Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception

OPENALEX - Publications

Gangjian Zhang Shikui Wei Huaxin Pang Shuang Qiu Yao Zhao

Composed image retrieval (CIR) is an emerging and challenging research task that combines two modalities, a reference image, modification text, into one query to retrieve the target image. In online shopping scenarios, user would use text as feedback describe difference between desired order handle task, there must be main problems needed addressed. One localization problem: how precisely find those spatial areas of mentioned by text. The other effectively modify semantics based on However,...

10.1109/tmm.2023.3273466 article EN IEEE Transactions on Multimedia 2023-05-05

Heterogeneous Feature Alignment and Fusion in Cross-Modal Augmented Space for Composed Image Retrieval

OPENALEX - Publications

Huaxin Pang Shikui Wei Gangjian Zhang Shiyin Zhang Shuang Qiu and 1 more

Composed image retrieval (CIR) aims at fusing a reference and text feedback to search for the desired images. Compared general retrieval, it can model users' intent more comprehensively target images accurately, which has significant impacts in various real-world applications, such as E-commerce Internet search. However, because of existing heterogeneous semantic gap, synthetic understanding fusion both are difficult implement. In this work, tackle problem, we propose an end-to-end framework...

10.1109/tmm.2022.3208742 article EN IEEE Transactions on Multimedia 2022-09-22

Multi-granular Semantic Mining for Composed Image Retrieval

OPENALEX - Publications

Xiaotong Chen Shikui Wei Gangjian Zhang Yao Zhao

10.1109/icme57554.2024.10687702 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2024-07-15

MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

OPENALEX - Publications

Gangjian Zhang Nanjie Yao Shunsi Zhang Haiyun Zhao Guoliang Pang and 2 more

This paper investigates the research task of reconstructing 3D clothed human body from a monocular image. Due to inherent ambiguity single-view input, existing approaches leverage pre-trained SMPL(-X) estimation models or generative provide auxiliary information for reconstruction. However, these methods capture only general geometry and overlook specific geometric details, leading inaccurate skeleton reconstruction, incorrect joint positions, unclear cloth wrinkles. In response issues, we...

10.48550/arxiv.2412.03103 preprint EN arXiv (Cornell University) 2024-12-04

Coming Soon ...