- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Anomaly Detection Techniques and Applications
- Domain Adaptation and Few-Shot Learning
- Environmental Impact and Sustainability
- Advanced Vision and Imaging
- Video Analysis and Summarization
- Image Enhancement Techniques
- 3D Shape Modeling and Analysis
- Sustainable Industrial Ecology
- Visual Attention and Saliency Detection
- Face recognition and analysis
- Gait Recognition and Analysis
- Software-Defined Networks and 5G
- Topic Modeling
- Environmental Quality and Pollution
- Fire Detection and Safety Systems
- Image Retrieval and Classification Techniques
- Text and Document Classification Technologies
- Robotics and Sensor-Based Localization
- 3D Surveying and Cultural Heritage
- Sustainability and Ecological Systems Analysis
University of Science and Technology of China
2019-2025
First Affiliated Hospital of Henan University
2023-2025
Fuzhou University
2025
University of Chinese Academy of Sciences
2016-2024
Space Engineering University
2024
Ministry of Agriculture and Rural Affairs
2024
Chinese Academy of Agricultural Engineering
2024
National University of Defense Technology
2024
Peking University
2024
Chongqing Medical and Pharmaceutical College
2023-2024
In this paper, we formulate object tracking in a particle filter framework as multi-task sparse learning problem, which denote Multi-Task Tracking (MTT). Since model particles linear combinations of dictionary templates that are updated dynamically, the representation each is considered single task MTT. By employing popular sparsity-inducing ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p, q</sub> mixed norms (p ∈ {2, ∞} and q = 1),...
The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by VOT initiative. Results of 51 trackers are presented; many state-of-the-art published at major computer vision conferences or journals in recent years. evaluation included standard and other popular methodologies a new "real-time" experiment simulating situation where processes images as if provided continuously running sensor. Performance tested typically far exceeds baselines. source...
In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking. We first present the (MCF) that takes interdependencies among different features into account to learn filters jointly. The proposed MCPF is designed exploit and complement strength of MCF filter. Compared with existing tracking methods based on filters, tracker has several advantages. First, it can shepherd sampled particles toward modes target state distribution via MCF, thereby resulting...
RGB-Infrared (IR) person re-identification is an important and challenging task due to large cross-modality variations between RGB IR images. Most conventional approaches aim bridge the gap with feature alignment by representation learning. Different from existing methods, in this paper, we propose a novel end-to-end Alignment Generative Adversarial Network (AlignGAN) for RGB-IR RE-ID task. The proposed model enjoys several merits. First, it can exploit pixel jointly. To best of our...
Tracking by siamese networks has achieved favorable performance in recent years. However, most of existing methods do not take full advantage spatial-temporal target appearance modeling under different contextual situations. In fact, the information can provide diverse features to enhance representation, and context is important for online adaption localization. To comprehensively leverage structure historical exemplars get benefit from information, this work, we present a novel Graph...
Cross-modality person re-identification (cm-ReID) is a challenging but key technology for intelligent video analysis. Existing works mainly focus on learning modality-shared representation by embedding different modalities into same feature space, lowering the upper bound of distinctiveness. In this paper, we tackle above limitation proposing novel cross-modality shared-specific transfer algorithm (termed cm-SSFT) to explore potential both information and modality-specific characteristics...
Occluded person re-identification (Re-ID) is a challenging task as persons are frequently occluded by various obstacles or other persons, especially in the crowd scenario. To address these issues, we propose novel end-to-end Part-Aware Transformer (PAT) for Re-ID through diverse part discovery via transformer encoder-decoder architecture, including pixel context based encoder and prototype decoder. The proposed PAT model enjoys several merits. First, to best of our knowledge, this first work...
The key of image and sentence matching is to accurately measure the visual-semantic similarity between an a sentence. However, most existing methods make use only intra-modality relationship within each modality or inter-modality regions words for cross-modal task. Different from them, in this work, we propose novel MultiModality Cross Attention (MMCA) Network by jointly modeling relationships unified deep model. In proposed MMCA, design cross-attention mechanism, which able exploit not...
Convolutional Neural Network (CNN) based methods generally take crowd counting as a regression task by outputting densities. They learn the mapping between image contents and density distributions. Though having achieved promising results, these data-driven networks are prone to overestimate or underestimate people counts of regions with different patterns, which degrades whole count accuracy. To overcome this problem, we propose an approach alleviate performance differences in regions....
In this paper, we aim at a practical system, magic closet, for automatic occasion-oriented clothing recommendation. Given user-input occasion, e.g., wedding, shopping or dating, closet intelligently suggests the most suitable from user's own photo album, automatically pairs user-specified reference (upper-body lower-body) with one online shops.
RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution learn aligned features bridge modalities. However, lack of correspondence labels every pair images, most methods try alleviate with set-level alignment by reducing distance entire sets. this may lead misalignment some instances, which limits performance for RGB-IR Re-ID. Different from existing methods, in paper, we propose generate...
Facial expression recognition (FER) is a challenging task due to different expressions under arbitrary poses. Most conventional approaches either perform face frontalization on non-frontal facial image or learn separate classifiers for each pose. Different from existing methods, in this paper, we propose an end-to-end deep learning model by exploiting poses and jointly simultaneous synthesis pose-invariant recognition. The proposed based generative adversarial network (GAN) enjoys several...
Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image text. Existing works coarse based on object co-occurrence statistics, while failing fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) GSMN explicitly models object, relation attribute as structured phrase, which not only allows of separately, but also benefits phrase. This is...
Recently, with the ever-growing action categories, zero-shot recognition (ZSAR) has been achieved by automatically mining underlying concepts (e.g., actions, attributes) in videos. However, most existing methods only exploit visual cues of these but ignore external knowledge information for modeling explicit relationships between them. In fact, humans have remarkable ability to transfer learned from familiar classes recognize unfamiliar classes. To narrow gap and humans, we propose an...
In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking. We first present the (MCF) that takes interdependencies among different object parts and features into account to learn filters jointly. Next, proposed MCPF is introduced exploit complement strength of MCF filter. Compared with existing tracking methods based on filters, enjoys several merits. First, it exploits derive jointly, makes learned enhance each other obtain consistent responses....
In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking. The proposed SCF takes part-based tracking strategies into account in tracker, and exploits circular shifts of all parts their motion modeling to preserve target object structure. Compared with existing trackers, our tracker has several advantages: (1) Due the part strategy, learned filters are less sensitive partial occlusion, have computational efficiency robustness. (2) able not only...
Learning semantic correspondence between image and text is significant as it bridges the gap vision language. The key challenge to accurately find correlate shared semantics in text. Most existing methods achieve this goal by representing a weighted combination of all fragments (image regions or words), where relevant obtain more attention, otherwise less. However, despite ones contribute semantic, irrelevant will less disturb it, thus lead misalignment correlation phase. To address issue,...
Geometry Projection is a powerful depth estimation method in monocular 3D object detection. It estimates dependent on heights, which introduces mathematical priors into the deep model. But projection process also error amplification problem, of estimated height will be amplified and reflected greatly at output depth. This property leads to uncontrollable inferences damages training efficiency. In this paper, we propose Uncertainty Network (GUP Net) tackle problem both inference stages....
Lane detection is a challenging task that requires predicting complex topology shapes of lane lines and distinguishing different types lanes simultaneously. Earlier works follow top-down roadmap to regress predefined anchors into various lines, which lacks enough flexibility fit due the fixed anchor shapes. Lately, some propose formulate as keypoint estimation problem describe more flexibly gradually group adjacent keypoints belonging same line in point-by-point manner, inefficient...
Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error use of templates. However, most sparse based trackers only consider holistic or local representations and do not make full intrinsic structure among inside candidates, thereby making less effective when similar objects appear under occlusion. In this paper, we propose a novel Structural Tracking (SST) algorithm, which exploits relationship candidates their patches...