- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Face recognition and analysis
- Human Pose and Action Recognition
- Adversarial Robustness in Machine Learning
- Diverse Approaches in Healthcare and Education Studies
- Data Visualization and Analytics
- Advanced Chemical Sensor Technologies
- Air Quality Monitoring and Forecasting
- Cancer-related molecular mechanisms research
- Emotion and Mood Recognition
- Image Retrieval and Classification Techniques
- Gait Recognition and Analysis
- Face and Expression Recognition
- Robotics and Sensor-Based Localization
Group Sense (China)
2019-2025
Beijing Sport University
2024
Recently low-bit (e.g., 8-bit) network quantization has been extensively studied to accelerate the inference. Besides inference, training with quantized gradients can further bring more considerable acceleration, since backward process is often computation-intensive. Unfortunately, inappropriate of propagation usually makes unstable and even crash. There lacks a successful unified framework that support diverse networks on various tasks. In this paper, we give an attempt build 8-bit (INT8)...
The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light understanding the transferability gap between from multilayer perceptron (MLP) perspective. While previous works [6], [8], [17] focus effectiveness of MLP image classification where evaluation are conducted same dataset, we reveal that...
Visual relationships are crucial for visual perception and reasoning, cover tasks like Scene Graph Generation, Human-Object Interaction, object affordance. Despite significant efforts, this field still suffers from the following limitations: specialists a specific task without considering similar ones, strict complex formulations with limited flexibility, underexploited reasoning language knowledge. To solve these limitations, we seek to build new framework, one model all tasks, over Large...
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time both users model owners, these usually form a sequence. Therefore, under such setting, selective expected be continuously removed while maintaining rest. We define this problem as continual forgetting identify three key challenges. (i) knowledge, efficient effective deleting crucial. (ii)...
Recently, person re-identification (ReID) has witnessed fast development due to its broad practical applications and proposed various settings, e.g., traditional ReID, clothes-changing visible-infrared ReID. However, current studies primarily focus on single specific tasks, which limits model applicability in real-world scenarios. This paper aims address this issue by introducing a novel instruct-ReID task that unifies 6 existing ReID tasks one retrieves images based provided visual or...
In this paper, we propose an online multi-object tracking (MOT) approach that integrates data association and single object (SOT) with a unified convolutional network (ConvNet), named DASOTNet. The intuition behind integrating SOT is they can complement each other. Following Siamese architecture, DASOTNet consists of the shared feature ConvNet, branch branch. Data treated as special re-identification task solved by learning discriminative features for different targets in To handle problem...
Recent years person re-identification (ReID) has been developed rapidly due to its broad practical applications. Most existing benchmarks assume that the same wears clothes across captured images, while, in real-world scenarios, may change his/her frequently. Thus Clothes-Changing ReID (CC-ReID) problem is introduced and several related are established. CC-ReID a very difficult task as main visual characteristics of human body, clothes, different between query gallery, clothes-irrelevant...
Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlations that spatial and non-spatial relationships. For correlation, aggregate similarity into part-based group then introduce Group Attention Learning to generate attention feature. On other hand, discover relationship, model group-based Graph Correlation explore affinities predefined groups. We utilize such affinity...
Recently low-bit (e.g., 8-bit) network quantization has been extensively studied to accelerate the inference. Besides inference, training with quantized gradients can further bring more considerable acceleration, since backward process is often computation-intensive. Unfortunately, inappropriate of propagation usually makes unstable and even crash. There lacks a successful unified framework that support diverse networks on various tasks. In this paper, we give an attempt build 8-bit (INT8)...
Video-text retrieval has greatly benefited from the massive web video in recent years, while performance is still limited to weak supervision uncurated data. In this work, we propose leverage well-represented information of each original modality and exploit complementary two views same video, i.e., clips captions, by using one view obtain positive samples with neighboring other. Respecting hierarchical organization real-world data, further design a cross-modal pre-training method (HCP)...
Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing identity-based editing. Related methods multi-view mainly project features into a joint feature space. However, clue are usually rather weakly correlated due to the semantic gap from modality-specific uniqueness. As result, these not suitable person clustering. In this paper, we propose <bold xmlns:mml="http://www.w3.org/1998/Math/MathML"...
The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light understanding the transferability gap between from multilayer perceptron (MLP) perspective. While previous works focus effectiveness of MLP image classification where evaluation are conducted same dataset, we reveal that projector also key...