NFDI4DS | UHH-SEMS - Publication Details

Zheng-Jun Zha

ORCID: 0000-0003-2510-8993

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5003217535

Research Areas

Advanced Image and Video Retrieval Techniques
Multimodal Machine Learning Applications
Human Pose and Action Recognition
Domain Adaptation and Few-Shot Learning
Video Surveillance and Tracking Methods
Image Retrieval and Classification Techniques
Advanced Image Processing Techniques
Advanced Neural Network Applications
Image Enhancement Techniques
Video Analysis and Summarization
Generative Adversarial Networks and Image Synthesis
Advanced Vision and Imaging
Image and Signal Denoising Methods
Gait Recognition and Analysis
Image Processing Techniques and Applications
Topic Modeling
Advanced Image Fusion Techniques
Digital Media Forensic Detection
Face and Expression Recognition
Visual Attention and Saliency Detection
Face recognition and analysis
Text and Document Classification Technologies
COVID-19 diagnosis using AI
Handwritten Text Recognition Techniques
Anomaly Detection Techniques and Applications

University of Science and Technology of China
2016-2025

Hebei Agricultural University
2025

Beijing Information Science & Technology University
2025

Chinese Academy of Sciences
2013-2024

Institute of Computing Technology
2020-2024

University of Science and Technology Chittagong
2023

Tianjin University
2022

University of Science and Technology Beijing
2020

Microsoft Research (United Kingdom)
2020

City University of Hong Kong
2019

Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

OPENALEX - Publications

Yue Gao Meng Wang Zheng-Jun Zha Jialie Shen Xuelong Li and 1 more

Due to the popularity of social media websites, extensive research efforts have been dedicated tag-based image search. Both visual information and tags investigated in field. However, most existing methods use characteristics either separately or sequentially order estimate relevance images. In this paper, we propose an approach that simultaneously utilizes both textual user tagged The estimation is determined with a hypergraph learning approach. method, constructed, where vertices represent...

10.1109/tip.2012.2202676 article EN IEEE Transactions on Image Processing 2012-06-05

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

OPENALEX - Publications

Heliang Zheng Jianlong Fu Zheng-Jun Zha Jiebo Luo

Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays significant role in fine-grained image recognition. Existing attention-based approaches localize amplify parts to learn details, which often suffer from limited number of heavy computational cost. In this paper, we propose such hundreds part proposals by Trilinear Attention Sampling Network (TASN) an efficient teacher-student manner. Specifically, TASN consists 1) trilinear attention module, generates maps...

10.1109/cvpr.2019.00515 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Adaptive Transfer Network for Cross-Domain Person Re-Identification

OPENALEX - Publications

Jiawei Liu Zheng-Jun Zha Di Chen Richang Hong Meng Wang

Recent deep learning based person re-identification approaches have steadily improved the performance for benchmarks, however they often fail to generalize well from one domain another. In this work, we propose a novel adaptive transfer network (ATNet) effective cross-domain re-identification. ATNet looks into essential causes of gap and addresses it following principle "divide-and-conquer". It decomposes complicated set factor-wise sub-transfers, each which concentrates on style with...

10.1109/cvpr.2019.00737 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

OPENALEX - Publications

Ziqi Zhang Yaya Shi Chunfeng Yuan Bing Li Peijin Wang and 2 more

Taking full advantage of the information from both vision and language is critical for video captioning task. Existing models lack adequate visual representation due to neglect interaction between object, sufficient training content-related words long-tailed problems. In this paper, we propose a complete system including novel model an effective strategy. Specifically, object relational graph (ORG) based encoder, which captures more detailed features enrich representation. Meanwhile, design...

10.1109/cvpr42600.2020.01329 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification

OPENALEX - Publications

Meng Wang Richang Hong Guangda Li Zheng-Jun Zha Shuicheng Yan and 1 more

With the explosive growth of web videos on Internet, it becomes challenging to efficiently browse hundreds or even thousands videos. When searching an event query, users are often bewildered by vast quantity returned search engines. Exploring such results will be time consuming and also degrade user experience. In this paper, we present approach for driven video summarization tag localization key-shot mining. We first localize tags that associated with each into its shots. Then, estimate...

10.1109/tmm.2012.2185041 article EN IEEE Transactions on Multimedia 2012-01-31

MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition

OPENALEX - Publications

Yizhou Zhou Xiaoyan Sun Zheng-Jun Zha Wenjun Zeng

Human actions in videos are three-dimensional (3D) signals. Recent attempts use 3D convolutional neural networks (CNNs) to explore spatio-temporal information for human action recognition. Though promising, CNNs have not achieved high performance on this task with respect their well-established two-dimensional (2D) counterparts visual recognition still images. We argue that the training complexity of fusion and huge memory cost convolution hinder current CNNs, which stack convolutions layer...

10.1109/cvpr.2018.00054 article EN 2018-06-01

Learning to Assemble Neural Module Tree Networks for Visual Grounding

OPENALEX - Publications

Daqing Liu Hanwang Zhang Zheng-Jun Zha Feng Wu

Visual grounding, a task to ground (i.e., localize) natural language in images, essentially requires composite visual reasoning. However, existing methods over-simplify the nature of into monolithic sentence embedding or coarse composition subject-predicate-object triplet. In this paper, we propose an intuitive, explainable, and fashion as it should be. particular, develop novel modular network called Neural Module Tree (NMTree) that regularizes grounding along dependency parsing tree...

10.1109/iccv.2019.00477 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Mining Travel Patterns from Geotagged Photos

OPENALEX - Publications

Yan-Tao Zheng Zheng-Jun Zha Tat‐Seng Chua

Recently, the phenomenal advent of photo-sharing services, such as Flickr and Panoramio, have led to volumous community-contributed photos with text tags, timestamps, geographic references on Internet. The photos, together their time- geo-references, become digital footprints photo takers implicitly document spatiotemporal movements. This study aims leverage wealth these enriched online analyze people’s travel patterns at local level a tour destination. Specifically, we focus our analysis...

10.1145/2168752.2168770 article EN ACM Transactions on Intelligent Systems and Technology 2012-05-01

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

OPENALEX - Publications

Yuxin Wang Hongtao Xie Zheng-Jun Zha Mengting Xing Zilong Fu and 1 more

Scene text detection has witnessed rapid development in recent years. However, there still exists two main challenges: 1) many methods suffer from false positives their representations; 2) the large scale variance of scene texts makes it hard for network to learn samples. In this paper, we propose ContourNet, which effectively handles these problems taking a further step toward accurate arbitrary-shaped detection. At first, scale-insensitive Adaptive Region Proposal Network (Adaptive-RPN) is...

10.1109/cvpr42600.2020.01177 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Towards a new generation of artificial intelligence in China

OPENALEX - Publications

Fei Wu Cewu Lu Mingjie Zhu Hao Chen Jun Zhu and 10 more

10.1038/s42256-020-0183-4 article EN Nature Machine Intelligence 2020-06-16

Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification

OPENALEX - Publications

Dechao Meng Liang Li Xuejing Liu Yadong Li Shijie Yang and 4 more

Vehicle Re-Identification is to find images of the same vehicle from various views in cross-camera scenario. The main challenges this task are large intra-instance distance caused by different and subtle inter-instance discrepancy similar vehicles. In paper, we propose a parsing-based view-aware embedding network (PVEN) achieve feature alignment enhancement for ReID. First, introduce parsing parse into four then align features mask average pooling. Such provides fine-grained representation...

10.1109/cvpr42600.2020.00713 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Multi-Scale Triplet CNN for Person Re-Identification

OPENALEX - Publications

Jiawei Liu Zheng-Jun Zha Qi Tian Dong Liu Ting Yao and 2 more

Person re-identification aims at identifying a certain person across non-overlapping multi-camera networks. It is fundamental and challenging task in automated video surveillance. Most existing researches mainly rely on hand-crafted features, resulting unsatisfactory performance. In this paper, we propose multi-scale triplet convolutional neural network which captures visual appearance of various scales. We to optimize the parameters by comparative similarity loss massive sample triplets,...

10.1145/2964284.2967209 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

A Fast Uyghur Text Detector for Complex Background Images

OPENALEX - Publications

Chenggang Yan Hongtao Xie Jianjun Chen Zheng-Jun Zha Xinhong Hao and 2 more

Uyghur text localization in images with complex backgrounds is a challenging yet important task for many applications. Generally, characters consist of strokes uniform features, and they are distinct from color, intensity, texture. Based on these differences, we propose FASTroke keypoint extractor, which fast stroke-specific. Compared the commonly used MSER detector, produces less than twice amount components recognizes at least 10% more characters. While line usually have features such as...

10.1109/tmm.2018.2838320 article EN IEEE Transactions on Multimedia 2018-05-18

Group-aware Label Transfer for Domain Adaptive Person Re-identification

OPENALEX - Publications

Kecheng Zheng Wu Liu Lingxiao He Tao Mei Jiebo Luo and 1 more

Unsupervised Domain Adaptive (UDA) person re-identification (ReID) aims at adapting the model trained on a labeled source-domain dataset to target-domain without any further annotations. Most successful UDA-ReID approaches combine clustering-based pseudo-label prediction with representation learning and perform two steps in an alternating fashion. However, offline interaction between these may allow noisy pseudo labels substantially hinder capability of model. In this paper, we propose...

10.1109/cvpr46437.2021.00527 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Camera Lens Super-Resolution

OPENALEX - Publications

Chang Chen Zhiwei Xiong Xinmei Tian Zheng-Jun Zha Feng Wu

Existing methods for single image super-resolution (SR) are typically evaluated with synthetic degradation models such as bicubic or Gaussian downsampling. In this paper, we investigate SR from the perspective of camera lenses, named CameraSR, which aims to alleviate intrinsic tradeoff between resolution (R) and field-of-view (V) in realistic imaging systems. Specifically, view R-V a latent model process learn reverse it low- high-resolution pairs. To obtain paired images, propose two novel...

10.1109/cvpr.2019.00175 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization

OPENALEX - Publications

Chuanbin Liu Hongtao Xie Zheng-Jun Zha Lingfeng Ma Lingyun Yu and 1 more

Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most existing models perform poorly FGVC, due to pivotal limitations proposing and region-based feature learning. 1) The are predominantly located based on filter responses over images, which can not be directly optimized with performance metric. 2) Existing methods train extractor as one-hot classification task individually, while neglecting knowledge from...

10.1609/aaai.v34i07.6822 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning

OPENALEX - Publications

Shaobo Min Hantao Yao Hongtao Xie Chaoqun Wang Zheng-Jun Zha and 1 more

Generalized zero-shot learning aims to recognize images from seen and unseen domains. Recent methods focus on a unified semantic-aligned visual representation transfer knowledge between two domains, while ignoring the effect of semantic-free in alleviating biased recognition problem. In this paper, we propose novel Domain-aware Visual Bias Eliminating (DVBE) network that constructs complementary representations, i.e., semantic-aligned, treat domains separately. Specifically, explore...

10.1109/cvpr42600.2020.01268 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification

OPENALEX - Publications

Kecheng Zheng Cuiling Lan Wenjun Zeng Zhizheng Zhang Zheng-Jun Zha

Many unsupervised domain adaptive (UDA) person ReID approaches combine clustering-based pseudo-label prediction with feature fine-tuning. However, because of gap, the pseudo-labels are not always reliable and there noisy/incorrect labels. This would mislead representation learning deteriorate performance. In this paper, we propose to estimate exploit credibility assigned each sample alleviate influence noisy labels, by suppressing contribution samples. We build our baseline framework using...

10.1609/aaai.v35i4.16468 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Image De-Raining Transformer

OPENALEX - Publications

Jie Xiao Xueyang Fu Aiping Liu Feng Wu Zheng-Jun Zha

Existing deep learning based de-raining approaches have resorted to the convolutional architectures. However, intrinsic limitations of convolution, including local receptive fields and independence input content, hinder model's ability capture long-range complicated rainy artifacts. To overcome these limitations, we propose an effective efficient transformer-based architecture for image de-raining. First, introduce general priors vision tasks, i.e., locality hierarchy, into network so that...

10.1109/tpami.2022.3183612 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-06-16

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning

OPENALEX - Publications

Kai Zhu Yang Cao Wei Zhai Jie Cheng Zheng-Jun Zha

Few-shot class-incremental learning is to recognize the new classes given few samples and not forget old classes. It a challenging task since representation optimization prototype reorganization can only be achieved under little supervision. To address this problem, we propose novel incremental scheme. Our scheme consists of random episode selection strategy that adapts feature various generated episodes enhance corresponding extensibility, self-promoted refinement mechanism which...

10.1109/cvpr46437.2021.00673 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Rain Streak Removal via Dual Graph Convolutional Network

OPENALEX - Publications

Xueyang Fu Qi Qi Zheng-Jun Zha Yurui Zhu Xinghao Ding

Deep convolutional neural networks (CNNs) have become dominant in the single image de-raining area. However, most deep CNNs-based methods are designed by stacking vanilla layers, which can only be used to model local relations. Therefore, long-range contextual information is rarely considered for this specific task. To address above problem, we propose a simple yet effective dual graph network (GCN) rain removal. Specifically, design two graphs perform global relational modeling and...

10.1609/aaai.v35i2.16224 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Disentangle Your Dense Object Detector

OPENALEX - Publications

Zehui Chen Chenhongyi Yang Qiaofei Li Feng Zhao Zheng-Jun Zha and 1 more

Deep learning-based dense object detectors have achieved great success in the past few years and been applied to numerous multimedia applications such as video understanding. However, current training pipeline for is compromised lots of conjunctions that may not hold. In this paper, we investigate three important conjunctions: 1) only samples assigned positive classification head are used train regression head; 2) share same input feature computational fields defined by parallel...

10.1145/3474085.3475351 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

OPENALEX - Publications

Kai Zhu Wei Zhai Yang Cao Jiebo Luo Zheng-Jun Zha

Non-exemplar class-incremental learning is to recognize both the old and new classes when class samples cannot be saved. It a challenging task since representation optimization feature retention can only achieved under supervision from classes. To address this problem, we propose novel self-sustaining expansion scheme. Our scheme consists of structure reorganization strategy that fuses main-branch side-branch updating maintain features, distillation transfer invariant knowledge. Furthermore,...

10.1109/cvpr52688.2022.00908 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification

OPENALEX - Publications

Zhipeng Huang Jiawei Liu Liang Li Kecheng Zheng Zheng-Jun Zha

RGB-infrared person re-identification is an emerging cross-modality task, which very challenging due to significant modality discrepancy between RGB and infrared images. In this work, we propose a novel modality-adaptive mixup invariant decomposition (MID) approach for towards learning modality-invariant discriminative representations. MID designs scheme generate suitable mixed images mitigating the inherent at pixel-level. It formulates procedure as Markov decision process, where...

10.1609/aaai.v36i1.19987 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning

OPENALEX - Publications

Yunbin Tu Liang Li Li Su Zheng-Jun Zha Qingming Huang

Change captioning aims to describe the semantic change between two similar images. In this process, as most typical distractor, viewpoint leads pseudo changes about appearance and position of objects, thereby overwhelming real change. Besides, since visual signal appears in a local region with weak feature, it is difficult for model directly translate learned features into sentence. paper, we propose syntax-calibrated multi-aspect relation transformer learn effective under different scenes,...

10.1109/tpami.2024.3365104 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-13

Coming Soon ...