- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Image and Signal Denoising Methods
- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Video Surveillance and Tracking Methods
- Neural Networks and Applications
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Adversarial Robustness in Machine Learning
- Image Processing Techniques and Applications
- Image Enhancement Techniques
- Composite Material Mechanics
- Medical Image Segmentation Techniques
- Advanced Data Compression Techniques
- Remote Sensing and LiDAR Applications
- Gait Recognition and Analysis
- Power Systems Fault Detection
- Metabolomics and Mass Spectrometry Studies
- Nutritional Studies and Diet
- Radiomics and Machine Learning in Medical Imaging
- Blind Source Separation Techniques
- 3D Shape Modeling and Analysis
East China Normal University
2021-2025
Chongqing University of Science and Technology
2025
Shanghai Jiao Tong University
2022-2024
Shanghai Mental Health Center
2022-2024
Ministry of Education of the People's Republic of China
2023-2024
Yangtze University
2024
Shanghai Ninth People's Hospital
2022-2024
National University of Singapore
2019-2020
Xiamen University
2016-2019
Fuzhou University
2017
Single image dehazing is a challenging ill-posed problem due to the severe information degeneration. However, existing deep learning based methods only adopt clear images as positive samples guide training of network while negative unexploited. Moreover, most them focus on strengthening with an increase depth and width, leading significant requirement computation memory. In this paper, we propose novel contrastive regularization (CR) built upon exploit both hazy samples, respectively. CR...
Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner iteratively and retraining which may not be optimal computation intensive. Besides, these are designed specific structure, such as filter block structures without jointly heterogeneous structures. In this paper, we propose an effective structured approach that prunes well other end-to-end manner. To...
Accelerating convolutional neural networks has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning been regarded as a promising solution, which is due to its advantage significant speedup and memory reduction of both network model intermediate feature maps. To this end, most tend prune filters layer-wise fixed manner, incapable dynamically recover previously removed filter, well jointly optimize pruned across layers. In paper,...
The success of convolutional neural networks (CNNs) in computer vision applications has been accompanied by a significant increase computation and memory costs, which prohibits their usage on resource-limited environments, such as mobile systems or embedded devices. To this end, the research CNN compression recently become emerging. In paper, we propose novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speed up reduce overhead CNNs, can be well...
Convolutional neural networks (CNNs) have achieved remarkable success in various computer vision tasks, which are extremely powerful to deal with massive training data by using tens of millions parameters. However, CNNs often cost significant memory and computation consumption, prohibits their usage resource-limited environments such as mobile or embedded devices. To address the above issues, existing approaches typically focus on either accelerating convolutional layers compressing...
Compressing convolutional neural networks (CNNs) has received ever-increasing research focus. However, most existing CNN compression methods do not interpret their inherent structures to distinguish the implicit redundancy. In this paper, we investigate problem of from a novel interpretable perspective. The relationship between input feature maps and 2D kernels is revealed in theoretical framework, based on which kernel sparsity entropy (KSE) indicator proposed quantitate map importance...
The Information Bottleneck (IB) provides an information theoretic principle for representation learning, by retaining all relevant predicting label while minimizing the redundancy. Though IB has been applied to a wide range of applications, its optimization remains challenging problem which heavily relies on accurate estimation mutual information. In this paper, we present new strategy, Variational Self-Distillation (VSD), scalable, flexible and analytic solution essentially fitting but...
Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression. However, these two techniques are traditionally deployed an isolated manner, leading to significant accuracy drop when pursuing high compression rates. In this paper, we propose a Collaborative Compression (CC) scheme, which joints channel compress CNN models by simultaneously learning the model sparsity low-rankness. Specifically, first investigate sensitivity of each...
Video anomaly detection aims to automatically identify unusual objects or behaviours by learning from normal videos. Previous methods tend use simplistic reconstruction prediction constraints, which leads the insufficiency of learned representations for data. As such, we propose a novel bi-directional architecture with three consistency constraints comprehensively regularize task pixel-wise, cross-modal, and temporal-sequence levels. First, predictive is proposed consider symmetry property...
U-Nets have achieved tremendous success in medical image segmentation. Nevertheless, it may limitations global (long-range) contextual interactions and edge-detail preservation. In contrast, the Transformer module has an excellent ability to capture long-range dependencies by leveraging self-attention mechanism into encoder. Although was born model dependency on extracted feature maps, still suffers high computational spatial complexities processing high-resolution 3D maps. This motivates us...
Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine types images or merge backbone features through elaborated neural network modules. However, these methods neglect that modality disparities affect cross-modality as with camera focal lengths, placements, angles are hardly fused. In this paper, we investigate by...
Convolutional neural networks (CNNs) are highly successful for super-resolution (SR) but often require sophisticated architectures with heavy memory cost and computational overhead significantly restricts their practical deployments on resource-limited devices. In this paper, we proposed a novel contrastive self-distillation (CSD) framework to simultaneously compress accelerate various off-the-shelf SR models. particular, channel-splitting network can first be constructed from target teacher...
Continual learning aims to enable a model incrementally learn knowledge from sequentially arrived data. Previous works adopt the conventional classification architecture, which consists of feature extractor and classifier. The is shared across tasks or classes, but one specific group weights classifier corresponding new class should be expanded. Consequently, parameters continual learner gradually increase. Moreover, as contains all historical certain size memory usually required store...
Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner iteratively and retraining which may not be optimal computation intensive. Besides, these are designed specific structure, such as filter block structures without jointly heterogeneous structures. In this paper, we propose an effective structured approach that prunes well other end-to-end manner. To...
The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow (LLMs) with powerful capabilities visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest most capable MLLM built the ground up for multi-modality. In light superior reasoning capabilities, can Gemini challenge GPT-4V's leading position learning?...
Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within single training dataset. In this paper, we propose novel open-vocabulary framework, namely WSOVOD, extend traditional WSOD detect concepts and utilize diverse datasets with only image-level annotations. To achieve this, explore three vital strategies, including dataset-level feature adaptation, salient...