- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- COVID-19 diagnosis using AI
- Medical Image Segmentation Techniques
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Network Packet Processing and Optimization
- 3D Shape Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Advanced Image Processing Techniques
- Image Processing and 3D Reconstruction
- Machine Learning and ELM
- Visual Attention and Saliency Detection
- Remote Sensing and LiDAR Applications
- Geophysical Methods and Applications
- International Business and FDI
- Genomics and Phylogenetic Studies
- International Arbitration and Investment Law
- Image and Video Quality Assessment
- Topic Modeling
- 3D Surveying and Cultural Heritage
- Video Surveillance and Tracking Methods
- Imbalanced Data Classification Techniques
Zhejiang University
2021-2024
State Key Laboratory of Clean Energy Utilization
2021
Change detection in remote sensing imagery is a critical technique for Earth observation, primarily focusing on pixel-level segmentation of change regions between bi-temporal images. The essence lies determining whether corresponding pixels images have changed. In deep learning, the spatial and channel dimensions feature maps represent different information from original this study, we found that tasks, difference can be computed not only dimension features but also dimension. Therefore,...
Customized generation has achieved significant progress in image synthesis, yet personalized video remains challenging due to temporal inconsistencies and quality degradation. In this paper, we introduce CustomVideoX, an innovative framework leveraging the diffusion transformer for from a reference image. CustomVideoX capitalizes on pre-trained networks by exclusively training LoRA parameters extract features, ensuring both efficiency adaptability. To facilitate seamless interaction between...
Auto-regressive models have made significant progress in the realm of text-to-image synthesis, yet devising an appropriate model architecture and training strategy to achieve a satisfactory level remains important avenue exploration. In this work, we introduce MARS, novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic visual...
Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...
Existing image semantic segmentation methods favor learning consistent representations by extracting long-range contextual features with the attention, multi-scale, or graph aggregation strategies. These usually treat misclassified and correctly classified pixels equally, hence misleading optimization process causing inconsistent intra-class pixel feature in embedding space during learning. In this paper, we propose auxiliary representation calibration head (RCH), which consists of...
Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...
Training AI models has always been challenging, especially when there is a need for custom to provide personalized services. Algorithm engineers often face lengthy process iteratively develop tailored specific business requirements, making it even more difficult non-experts. The quest high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, become key focus in industry. Leveraging powerful analytical, planning, decision-making capabilities...
Though diffusion models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies, they don't generalize well on long-tailed datasets due to minority classes lacking and semantic information. To overcome aforementioned challenges, we first take a closer look at collapse tail category patterns under long-tail distributed propose an alternative but easy-to-use effective solution, Long-Tailed Bias Solver model image synthesis (LTB-Solver),...
Recent advancements in text-to-image generation models have dramatically enhanced the of photorealistic images from textual prompts, leading to an increased interest personalized applications, particularly multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, need accurately maintain details each referenced subject accordance with descriptions; and secondly, difficulty achieving a cohesive representation multiple subjects single image without...
Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion domain image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing textual while fine-tuning component. methodology...
Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent propose a finetuning-free approach with decoupled cross-attention mechanism to personalized requiring no test-time finetuning. However, when multiple are provided, current encounters object confusion problem and fails map each image its corresponding object, thereby seriously limiting scope of application. To address problem, in this...
Long-tail learning seeks to address the key issue of head classes dominating process under extreme class imbalance in real-world circumstances. Data augmentation, which tries pack a set augmentation approaches increase size and quality datasets for model training, has shown be worthwhile research topic. The long-tail problem cannot solved using current data techniques. subject how undertake long-tailed more effectively is yet unanswered. diffusion-based method, referred as DiffuRC, enables...
Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the data distribution can corrupt representation space, where distance between head and tail categories is much larger than two categories. This uneven feature space causes model to exhibit unclear inseparable decision boundaries on uniformly distributed test set, which lowers its performance. To address these challenges, we propose category...
Video surveillance systems are playing increasingly important roles in our everyday lives. To get meaningful information a timely and accurate manner, it is vital to optimally allocate computation communication resources for image classification tasks. In this paper, taking face recognition as an example, we propose novel end-to-edge collaborative computing system based on multi-exit network dynamically at the front end (the camera sensor) back mobile edge server). With ∊-greedy algorithm...
Long-tailed learning aims to tackle the crucial challenge that head classes dominate training procedure under severe class imbalance in real-world scenarios. However, little attention has been given how quantify dominance severity of representation space. Motivated by this, we generalize cosine-based classifiers a von Mises-Fisher (vMF) mixture model, denoted as vMF classifier, which enables quantitatively measure quality upon hyper-sphere space via calculating distribution overlap...
Data-free quantization has recently been a promising method to perform without access the original data. However, drawback of such approaches is homogenization synthetic data due low efficiency for diverse generation and performance collapse generator. To alleviate above issue, we propose novel Meta-BNS adversarial data-free scheme which consists module exploration module. automatically learns an enhancement coefficient matrix function BN loss provide suitable constrain on Adversarial...
While network binarization is a promising method in memory saving and speedup on hardware, it inevitably leads to residual errors of intermediate features, resulting performance capability degradation. To alleviate the above issue, we focus architecture design more suitable structure for extreme-low bit scenario. In this paper, propose baseline-auxiliary compensate features via searching auxiliary branches guided by feature similarity confidence score. The maps are reasonably enhanced...