Zhaoxiang Zhang

ORCID: 0000-0003-2648-3875
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Video Surveillance and Tracking Methods
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Advanced Vision and Imaging
  • Face recognition and analysis
  • Robotics and Sensor-Based Localization
  • Gait Recognition and Analysis
  • Face and Expression Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Visual Attention and Saliency Detection
  • Image Enhancement Techniques
  • Advanced Image Processing Techniques
  • Image Processing Techniques and Applications
  • Advanced Steganography and Watermarking Techniques
  • COVID-19 diagnosis using AI
  • Topic Modeling
  • Digital Media Forensic Detection
  • 3D Surveying and Cultural Heritage
  • Video Analysis and Summarization
  • Biometric Identification and Security
  • Image Processing and 3D Reconstruction

Xinjiang Agricultural University
2025

Northwestern Polytechnical University
2024-2025

Institute of Automation
2015-2024

Chinese Academy of Sciences
2015-2024

University of Chinese Academy of Sciences
2017-2024

Wellcome Centre for Anti-Infectives Research
2023-2024

Xijing Hospital
2024

Institute of Software
2024

Beijing Academy of Artificial Intelligence
2020-2024

Centre for Artificial Intelligence and Robotics
2021-2024

Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate effect receptive fields for scale Based on findings from exploration experiments, propose novel Trident Network (TridentNet) aiming generate scale-specific feature maps with uniform representational power. We construct parallel multi-branch architecture which each branch shares same transformation parameters but different fields. Then, adopt scale-aware...

10.1109/iccv.2019.00615 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

This paper proposes a novel method to address the problem of estimating number people in surveillance scenes with gathering and waiting. The proposed combines MID (mosaic image difference) based foreground segmentation algorithm HOG (histograms oriented gradients) head-shoulder detection provide an accurate estimation counts observed area. In our framework, MID-based module provides active areas for detect heads count people. Numerous experiments are conducted convincing results demonstrate...

10.1109/icpr.2008.4761705 article EN Proceedings - International Conference on Pattern Recognition/Proceedings/International Conference on Pattern Recognition 2008-12-01

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study dense keypoint regression framework that is previously inferior to detection and grouping framework. Our motivation regressing positions accurately needs learn representations focus on regions.We present a simple yet effective approach, named disentangled (DEKR). adopt adaptive convolutions through pixel-wise spatial transformer activate pixels regions accordingly them. use...

10.1109/cvpr46437.2021.01444 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Projective analysis is an important solution for 3D shape retrieval, since human visual perceptions of shapes rely on various 2D observations from different view points. Although multiple informative and discriminative views are utilized, most projection-based retrieval systems suffer heavy computational cost, thus cannot satisfy the basic requirement scalability search engines. In this paper, we present a real-time engine based projective images shapes. The property our results following...

10.1109/cvpr.2016.543 article EN 2016-06-01

As a typical cross-modal problem, image-text bi-directional retrieval relies heavily on the joint embedding learning and similarity measure for each pair. It remains challenging because prior works seldom explore semantic correspondences between modalities correlations in single modality at same time. In this work, we propose unified Context-Aware Attention Network (CAAN), which selectively focuses critical local fragments (regions words) by aggregating global context. Specifically, it...

10.1109/cvpr42600.2020.00359 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We have witnessed rapid evolution of deep neural network architecture design in the past years. These latest progresses greatly facilitate developments various areas such as computer vision and natural language processing. However, along with extraordinary performance, these state-of-the-art models also bring expensive computational cost. Directly deploying into applications real-time requirement is still infeasible. Recently, Hinton et al. shown that dark knowledge within a powerful teacher...

10.1609/aaai.v32i1.11783 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-29

Video objection detection (VID) has been a rising research direction in recent years. A central issue of VID is the appearance degradation video frames caused by fast motion. This problem essentially ill-posed for single frame. Therefore, aggregating features from other becomes natural choice. Existing methods rely heavily on optical flow or recurrent neural networks feature aggregation. However, these emphasize more temporally nearby frames. In this work, we argue that full-sequence level...

10.1109/iccv.2019.00931 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Image-level weakly-supervised semantic segmentation (WSSS) aims at learning by adopting only image class labels. Existing approaches generally rely on activation maps (CAM) to generate pseudo-masks and then train models. The main difficulty is that the CAM estimate covers partial foreground objects. In this paper, we argue critical factor preventing obtain full object mask classification boundary mismatch problem in applying WSSS. Because optimized task, it focuses discrimination across...

10.1109/cvpr42600.2020.00434 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Weakly supervised semantic segmentation with only image-level labels saves large human effort to annotate pixel-level labels. Cutting-edge approaches rely on various innovative constraints and heuristic rules generate the masks for every single image. Although great progress has been achieved by these methods, they treat each image independently do not take account of relationships across different images. In this paper, however, we argue that cross-image relationship is vital weakly...

10.1609/aaai.v34i07.6705 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

In this paper, we propose an anchor-free single-stage LiDAR-based 3D object detector – RangeDet. The most notable difference with previous works is that our method purely based on the range view representation. Compared commonly used voxelized or Bird's Eye View (BEV) representations, representation more compact and without quantization error. Although there are adopting it for semantic segmentation, its performance in detection largely behind BEV counterparts. We first analyze existing...

10.1109/iccv48922.2021.00291 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Learned image compression methods have exhibited superior rate-distortion performance than classical standards. Most existing learned models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN model is that its structure not designed for capturing local redundancy, especially the nonrepetitive textures, which severely affects reconstruction quality. Therefore, how to make full use both global and texture becomes core problem learning-based...

10.1109/cvpr52688.2022.01697 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. Previously, most segmentation methods heavily rely on object detection perform mask prediction based bounding boxes or dense centers. contrast, sparse set of activation maps, as new representation, to high-light informative regions each foreground object. Then instance-level features are obtained by aggregating according the highlighted recognition Moreover,...

10.1109/cvpr52688.2022.00439 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Acquiring sufficient ground-truth supervision to train deep vi- sual models has been a bottleneck over the years due data-hungry nature of learning. This is exacerbated in some structured prediction tasks, such as semantic segmen- tation, which requires pixel-level annotations. work ad- dresses weakly supervised segmentation (WSSS), with goal bridging gap between image-level anno- tations and segmentation. We formulate WSSS novel group-wise learning task that explicitly se- mantic...

10.1609/aaai.v35i3.16294 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Most existing unsupervised person re-identification (Re-ID) methods use clustering to generate pseudo labels for model training. Unfortunately, sometimes mixes different true identities together or splits the same identity into two more sub clusters. Training on these noisy clusters substantially hampers Re-ID accuracy. Due limited samples in each identity, we suppose there may lack some underlying information well reveal accurate To discover information, propose an Implicit Sample Extension...

10.1109/cvpr52688.2022.00722 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The goal of domain generalization (DG) is to enhance the capability model learned from a source other unseen domains. recently developed Sharpness-Aware Minimization (SAM) method aims achieve this by minimizing sharpness measure loss landscape. Though SAM and its variants have demonstrated impressive DG performance, they may not always converge desired flat region with small value. In paper, we present two conditions ensure that could minimum loss, an algorithm, named Gradient Matching...

10.1109/cvpr52729.2023.00367 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

With the surge of deep learning techniques, field person re-identification has witnessed rapid progress in recent years. Deep based methods focus on a discriminative feature space where data points are clustered compactly according to their corresponding identities. Most existing process individually or only involves fraction samples while building similarity structure. They ignore dense informative connections among more less. The lack holistic observation eventually leads inferior...

10.1109/iccv.2019.00508 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Sufficient training data normally is required to train deeply learned models. However, due the expensive manual process for labelling large number of images, amount available always limited. To produce more a deep network, Generative Adversarial Network (GAN) can be used generate artificial sample data. generated usually does not have annotation labels. solve this problem, in paper, we propose virtual label called Multi-pseudo Regularized Label (MpRL) and assign it With MpRL, will as...

10.1109/tip.2018.2874715 article EN IEEE Transactions on Image Processing 2018-10-08

Pedestrian attribute recognition has been an emerging research topic in the area of video surveillance. To predict existence a particular attribute, it is demanded to localize regions related attribute. However, this task, region annotations are not available. How carve out these attribute-related remains challenging. Existing methods applied attribute-agnostic visual attention or heuristic body-part localization mechanisms enhance local feature representations, while neglecting employ...

10.1109/iccv.2019.00510 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

With the fast development of effective and low-cost human skeleton capture systems, skeleton-based action recognition has attracted much attention recently. Most existing methods use Convolutional Neural Network (CNN) Recurrent (RNN) to extract spatio-temporal information embedded in sequences for recognition. However, these approaches are limited ability relational modeling a single skeleton, due loss important structural when converting raw data adapt input format CNN or RNN. In this...

10.1109/icme.2019.00147 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2019-07-01

Data association across frames is at the core of Multiple Object Tracking (MOT) task. This problem usually solved by a traditional graph-based optimization or directly learned via deep learning. Despite their popularity, we find some points worth studying in current paradigm: 1) Existing methods mostly ignore context information among tracklets and intra-frame detections, which makes tracker hard to survive challenging cases like severe occlusion. 2) The end-to-end solely rely on data...

10.1109/cvpr46437.2021.00526 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

The two-stage methods for instance segmentation, e.g. Mask R-CNN, have achieved excellent performance recently. However, the segmented masks are still very coarse due to downsampling operations in both feature pyramid and instance-wise pooling process, especially large objects. In this work, we propose a new method called RefineMask high-quality segmentation of objects scenes, which incorporates fine-grained features during segmenting process multi-stage manner. Through fusing more detailed...

10.1109/cvpr46437.2021.00679 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Existing works have designed end-to-end frameworks based on Faster-RCNN for person search. Due to the large receptive fields in deep networks, feature maps of each proposal, cropped from stem maps, involve redundant context information outside bounding boxes. However, search is a fine-grained task which needs accurate appearance information. Such can make model fail focus persons, so learned representations lack capacity discriminate various identities. To address this issue, we propose...

10.1109/cvpr42600.2020.00291 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01
Coming Soon ...