- Domain Adaptation and Few-Shot Learning
- Face recognition and analysis
- Multimodal Machine Learning Applications
- Adversarial Robustness in Machine Learning
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Topic Modeling
- Cancer-related molecular mechanisms research
- Biometric Identification and Security
- Image Retrieval and Classification Techniques
- COVID-19 diagnosis using AI
- Advanced Neural Network Applications
- Advanced Graph Neural Networks
- Generative Adversarial Networks and Image Synthesis
- Text and Document Classification Technologies
- Image Enhancement Techniques
- Mathematics, Computing, and Information Processing
- Digital Holography and Microscopy
- Remote-Sensing Image Classification
- Speech Recognition and Synthesis
- Image Processing Techniques and Applications
- Face and Expression Recognition
- Advanced Image Fusion Techniques
- Advanced Image Processing Techniques
- Fire Detection and Safety Systems
Zhejiang University
2022-2025
Vision Transformers (ViTs) have demonstrated powerful representation ability in various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly find that ViTs perform vulnerably when applied face recognition (FR) scenarios with extremely large datasets. We investigate the reasons for this phenomenon and discover existing data augmentation approach hard sample mining strategy are incompatible ViTs-based FR backbone due lack of tailored consideration on preserving...
Due to the large-scale image size and object variations, current CNN-based Transformer-based approaches for remote sensing semantic segmentation are suboptimal capturing long-range dependency or limited complex computational complexity. In this paper, we propose CM-UNet, comprising a encoder extracting local features Mamba-based decoder aggregating integrating global information, facilitating efficient of images. Specifically, CSMamba block is introduced build core decoder, which employs...
Diffusion models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts involving multiple objects, attribute binding, long descriptions. In this paper, we propose a novel framework called LLM4GEN, which enhances the semantic understanding of diffusion by leveraging representation Large Language Models (LLMs). It can be seamlessly incorporated into various as plug-and-play component. A specially...
The field of face recognition (FR) has undergone significant advancements with the rise deep learning. Recently, success unsupervised learning and graph neural networks demonstrated effectiveness data structure information. Considering that FR task can leverage large-scale training data, which intrinsically contains information, we aim to investigate how encode such critical information into latent space. As revealed from our observations, directly aligning between input spaces inevitably...
Vision Transformers (ViTs) have demonstrated powerful representation ability in various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly find that ViTs perform vulnerably when applied face recognition (FR) scenarios with extremely large datasets. We investigate the reasons for this phenomenon and discover existing data augmentation approach hard sample mining strategy are incompatible ViTs-based FR backbone due lack of tailored consideration on preserving...
Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by model itself, which inevitably exists noise and disturbs unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed Probabilistic Representation Contrastive Learning (PRCL) framework enhance robustness of We pixel-wise representation...
In the field of human-centric personalized image generation, adapter-based method obtains ability to customize and generate portraits by text-to-image training on facial data. This allows for identity-preserved personalization without additional fine-tuning in inference. Although there are improvements efficiency fidelity, is often a significant performance decrease test following ability, controllability, diversity generated faces compared base model. this paper, we analyze that degradation...