- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Machine Learning and ELM
- Advanced Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Machine Learning and Data Classification
- Viral Infections and Vectors
- Advanced Vision and Imaging
- Viral Infections and Outbreaks Research
- Generative Adversarial Networks and Image Synthesis
- Robotics and Sensor-Based Localization
- Cancer-related molecular mechanisms research
- EEG and Brain-Computer Interfaces
- 3D Shape Modeling and Analysis
- Non-Invasive Vital Sign Monitoring
- Gaussian Processes and Bayesian Inference
- Image Enhancement Techniques
- 3D Surveying and Cultural Heritage
- Advanced Sensor and Control Systems
- Speech Recognition and Synthesis
- Blind Source Separation Techniques
- Indoor and Outdoor Localization Technologies
- Image Processing and 3D Reconstruction
Chinese Academy of Sciences
2022-2024
University of Science and Technology Beijing
2023-2024
Institute of Computing Technology
2017-2024
University of Chinese Academy of Sciences
2017-2020
The learning of the deep networks largely relies on data with human-annotated labels. In some label insufficient situations, performance degrades decision boundary high density. A common solution is to directly minimize Shannon Entropy, but side effect caused by entropy minimization, \it i.e., reduction prediction diversity, mostly ignored. To address this issue, we reinvestigate structure classification output matrix a randomly selected batch. We find theoretical analysis that...
In unsupervised domain adaptation, rich domain-specific characteristics bring great challenge to learn domain-invariant representations. However, discrepancy is considered be directly minimized in existing solutions, which difficult achieve practice. Some methods alleviate the difficulty by explicitly modeling and parts representations, but adverse influence of explicit construction lies residual constructed this paper, we equip adversarial adaptation with Gradually Vanishing Bridge (GVB)...
In multimedia analysis, the task of domain adaptation is to adapt feature representation learned in source with rich label information target less or even no information. Significant research endeavors have been devoted aligning distributions between and domains top fully connected layers based on unsupervised DNN-based models. However, has arbitrarily constrained near output ends DNN models, which thus brings about inadequate knowledge transfer process, especially input end. We develop an...
In image-sentence retrieval task, correlated images and sentences involve different levels of semantic relevance. However, existing multi-modal representation learning paradigms fail to capture the meaningful component relation on word phrase level, while attention-based methods still suffer from component-level mismatching huge computation burden. We propose a Joint Global Co-Attentive Representation method (JGCAR) for retrieval. formulate global task which utilizes both intra-modal...
We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source S is only a subset of those unlabeled target T. The task to correctly classify all samples T including known and unknown categories. UODR challenging due discrepancy, which becomes even harder bridge when large number exist Moreover, classification rules propagated by graph CNN (GCN) may be distracted lack generalization capability. To measure discrepancy for asymmetric label space between...
The image-to-video adaptation task seeks to effectively harness both labeled images and unlabeled videos for achieving effective video recognition. modality gap of the image modalities domain discrepancy across two domains are essential challenges in this task. Existing methods reduce via close-set techniques, resulting inaccurate alignment as there exist outlier target frames. To tackle issue, we extend vanilla classifier with classes, where each class responsible capturing frames a...
Semi-supervised Domain adaptation (SSDA) has shown promising results by leveraging unlabeled data and limited labeled samples in the target domain. However, accessibility to source is hindered privacy concerns, giving rise Source Hypothesis Transfer (SSHT). Integrating SSDA methods directly into SSHT tasks straightforward but poses two significant challenges: i) The hypothesis (classifier) no longer supervised labels, relying on only a few labels may result collapse; ii) Trained models often...
In this paper, we address unsupervised domain adaptation under noisy environments, which is more challenging and practical than traditional adaptation. scenario, the model prone to overfitting labels, resulting in a pronounced shift notable decline overall performance. Previous methods employed prototype for on robust feature spaces. However, these approaches struggle effectively classify classes with similar features environments. To issue, propose new method detect correct confusing class...
Due to the domain discrepancy in visual adaptation, performance of source model degrades when bumping into high data density near decision boundary target domain. A common solution is minimize Shannon Entropy push away from area. However, entropy minimization also leads severe reduction prediction diversity, and unfortunately brings harm adaptation. In this paper, we investigate discriminability diversity by studying structure classification output matrix a randomly selected batch. We find...
In this paper, we tackle the task of domain adaptation under noisy environments; is a practical and challenging problem in which source corrupted with noise its labels, features, or both. Noise leads to inaccurate visual representations makes it harder estimate reduce discrepancy between target domains, resulting severe performance degradation domain. These challenges can be addressed offline sample selection following robust reduction. To achieve reliable selection, model uncertainty...
The bottleneck of visual domain adaptation always lies in the learning invariant representations. In this paper, we present a simple but effective technique named Adaptive Feature Swapping for features Unsupervised Domain Adaptation (UDA). aims to select semantically irrelevant from labeled source data and unlabeled target swap these with each other. Then merged representations are also utilized training prediction consistency constraints. way, model is encouraged learn that robust...
The learning of the deep networks largely relies on data with human-annotated labels. In some label insufficient situations, performance degrades decision boundary high density. A common solution is to directly minimize Shannon Entropy, but side effect caused by entropy minimization, i.e., reduction prediction diversity, mostly ignored. To address this issue, we reinvestigate structure classification output matrix a randomly selected batch. We find theoretical analysis that discriminability...
We address the image-to-video adaptation task that aims to leverage labeled images and unlabeled videos for video recognition. There are two major challenges in this task, including domain discrepancy between domains, modality gap image modalities. Existing methods mainly employ a two-stage paradigm by first adopting frame-level reduce then learning spatio-temporal model bridge gap. In paper, we provide new perspective propose single-stage method synthesizes from source static converts...
Zero-shot video classification (ZSVC) that aims to recognize classes have never been seen during model training, has become a thriving research direction. ZSVC is achieved by building mappings between visual and semantic embeddings. Recently, automatically mining the underlying objects in videos as attributes incorporating external commonsense knowledge. However, object mined from categories can not generalized unseen ones. Besides, category-object relationships are usually extracted...
Due to the rapid growth of online video data, video-text retrieval techniques are in urgent need, which aim search for most relevant given a natural language caption and vice versa. The major challenge this task is how identify true fine-grained semantic correspondence between videos texts, using only document-level correspondence. To deal with issue, we propose simple yet effective two-stream framework takes concept information into account introduces new branch semantic-level matching. We...
Zero-shot video recognition (ZSVR) is a task that aims to recognize categories have not been seen during the model training process. Recently, vision-language models (VLMs) pre-trained on large-scale image-text pairs demonstrated impressive transferability for ZSVR. To make VLMs applicable domain, existing methods often use an additional temporal learning module after image-level encoder learn relationships among frames. Unfortunately, from unseen categories, we observe abnormal phenomenon...
In unsupervised domain adaptation, rich domain-specific characteristics bring great challenge to learn domain-invariant representations. However, discrepancy is considered be directly minimized in existing solutions, which difficult achieve practice. Some methods alleviate the difficulty by explicitly modeling and parts representations, but adverse influence of explicit construction lies residual constructed this paper, we equip adversarial adaptation with Gradually Vanishing Bridge (GVB)...
In this paper, we address unsupervised domain adaptation under noisy environments, which is more challenging and practical than traditional adaptation. scenario, the model prone to overfitting labels, resulting in a pronounced shift notable decline overall performance. Previous methods employed prototype for on robust feature spaces. However, these approaches struggle effectively classify classes with similar features environments. To issue, propose new method detect correct confusing class...