- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Face and Expression Recognition
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Remote-Sensing Image Classification
- Anomaly Detection Techniques and Applications
- Sparse and Compressive Sensing Techniques
- Multimodal Machine Learning Applications
- Face recognition and analysis
- Visual Attention and Saliency Detection
- Gait Recognition and Analysis
- Image Enhancement Techniques
- Text and Document Classification Technologies
- Advanced Image Processing Techniques
- Microbial Community Ecology and Physiology
- Image Processing Techniques and Applications
- Human Mobility and Location-Based Analysis
- Advanced Algorithms and Applications
- Image and Signal Denoising Methods
- Hand Gesture Recognition Systems
Pudong Medical Center
2025
Fudan University
2025
Northeastern University
2025
Nanjing University of Science and Technology
2015-2024
Lanzhou City University
2024
Tianjin University of Technology
2024
Jiangsu University
2019-2023
Nanjing University of Information Science and Technology
2018-2023
Huazhong University of Science and Technology
2013-2016
National University of Singapore
2014-2015
Recently, very deep convolutional neural networks (CNNs) have been attracting considerable attention in image restoration. However, as the depth grows, longterm dependency problem is rarely realized for these models, which results prior states/layers having little influence on subsequent ones. Motivated by fact that human thoughts persistency, we propose a persistent memory network (MemNet) introduces block, consisting of recursive unit and gate unit, to explicitly mine through an adaptive...
In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well within task. Thus, can conduct two types of propagations, cross-task propagation task-specific propagation, adaptively diffuse those similar patterns. former integrates affinity patterns adapt each task...
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates cross-layer context, global image-level within-super-pixel context and cross-super-pixel neighborhood into unified network. Given an input image, Co-CNN produces pixel-wise categorization in end-to-end way. First, is captured by our basic local-to-global-to-local structure, hierarchically combines semantic structure local fine details within...
Variations of human body skeletons may be considered as dynamic graphs, which are generic data representation for numerous real-world applications. In this paper, we propose a spatio-temporal graph convolution (STGC) approach assembling the successes local convolutional filtering and sequence learning ability autoregressive moving average. To encode constructed multi-scale filters, consisting matrices receptive fields signal mappings, recursively performed on structured temporal spatial...
Frame reconstruction (current or future frame) based on Auto-Encoder (AE) is a popular method for video anomaly detection. With models trained the normal data, errors of anomalous scenes are usually much larger than those ones. Previous methods introduced memory bank into AE, encoding diverse patterns across training videos. However, they memory-consuming and cannot cope with unseen new scenarios in testing data. In this work, we propose dynamic prototype unit (DPU) to encode dynamics as...
In this work, we introduce a novel feature-attentioned object detection framework to boost its performance in remote sensing imagery, which can focus on learning these intrinsic representations from different aspects an end-to-end framework. Firstly, when fusing multi-scale visual features of backbone network, adopt the channel-wise and pixel-wise attentions enhance object-related weaken background/noise information. Secondly, adaptive multiple receptive fields attention mechanism is...
This work studies the Generalized Singular Value Thresholding (GSVT) operator associated with a nonconvex function g defined on singular values of X. We prove that GSVT can be obtained by performing proximal since Proxg(.) is monotone when lower bounded. If satisfies some conditions (many popular surrogate functions, e.g., lp-norm, 0 < p 1, l0-norm are special cases), general solver to find Proxg(b) proposed for any b ≥ 0. greatly generalizes known (SVT) which basic subroutine in many...
Rectified linear activation units are important components for state-of-the-art deep convolutional networks. In this paper, we propose a novel S-shaped rectifiedlinear unit (SReLU) to learn both convexand non-convex functions, imitating the multiple function forms given by two fundamental laws, namely Webner-Fechner law and Stevens law, in psychophysics neural sciences. Specifically, SReLU consists of three piecewise which formulated four learnable parameters. The is learned jointly with...
Recently, very deep convolutional neural networks (CNNs) have been attracting considerable attention in image restoration. However, as the depth grows, long-term dependency problem is rarely realized for these models, which results prior states/layers having little influence on subsequent ones. Motivated by fact that human thoughts persistency, we propose a persistent memory network (MemNet) introduces block, consisting of recursive unit and gate unit, to explicitly mine through an adaptive...
Motivated by our observations on RGB-T data that pattern correlations are high-frequently recurred across modalities also along sequence frames, in this paper, we propose a cross-modal pattern-propagation (CMPP) tracking framework to diffuse instance patterns spatial domain as well temporal domain. To bridge modalities, the intra-modal paired pattern-affinities derived reveal those latent cues between heterogenous modalities. Through correlations, useful may be mutually propagated so fulfill...
Rectified linear activation units are important components for state-of-the-art deep convolutional networks. In this paper, we propose a novel S-shaped rectified unit (SReLU) to learn both convex and non-convex functions, imitating the multiple function forms given by two fundamental laws, namely Webner-Fechner law Stevens law, in psychophysics neural sciences. Specifically, SReLU consists of three piecewise which formulated four learnable parameters. The is learned jointly with training...
Inspired by the observation that pattern structures high-frequently recur within intra-task also across tasks, we propose a pattern-structure diffusion (PSD) framework to mine and propagate task-specific task-across in task-level space for joint depth estimation, segmentation surface normal prediction. To represent local structures, model them as small-scale graphlets, two different ways, i.e., inter-task PSD. For former, overcome limit of locality use high-order recursive aggregation on...
A proper strategy to alleviate overfitting is critical a deep neural network (DNN). In this paper, we introduce the cross-loss-function regularization for boosting generalization capability of DNN, which results in multi-loss regularized DNN (ML-DNN) framework. For particular learning task, e.g., image classification, only single-loss function used all previous DNNs, and intuition behind multiloss framework that extra loss functions with different theoretical motivations (e.g., pairwise...
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates cross-layer context, global image-level semantic edge within-super-pixel context and cross-super-pixel neighborhood into unified network. Given an input image, Co-CNN produces pixelwise categorization in end-to-end way. First, is captured by our basic local-to-global-to-local structure, hierarchically combines information local fine details...
Object detection in remote sensing imagery is a critical yet challenging task the field of computer vision due to bird's-eye-view perspective. Although existing object approaches have achieved great advances through utilization deep features or rotation proposals, but they give insufficient consideration multilevel semantic information and its propagation for guiding learning process. Accordingly, this article, we propose hierarchical (HSP) framework boost performance imagery, which better...
This work studies the Generalized Singular Value Thresholding (GSVT) operator ${\text{Prox}}_{g}^{σ}(\cdot)$, \begin{equation*} {\text{Prox}}_{g}^{σ}(B)=\arg\min\limits_{X}\sum_{i=1}^{m}g(σ_{i}(X)) + \frac{1}{2}||X-B||_{F}^{2}, \end{equation*} associated with a nonconvex function $g$ defined on singular values of $X$. We prove that GSVT can be obtained by performing proximal (denoted as $\text{Prox}_g(\cdot)$) since $\text{Prox}_g(\cdot)$ is monotone when lower bounded. If satisfies some...
We propose a novel end-to-end deep architecture for face landmark detection, based on convolutional and deconvolutional network followed by carefully designed recurrent structures. The pipeline of this consists three parts. Through the first part, we encode an input image to resolution-preserved feature maps via with stacked layers. Then, in second estimate initial coordinates facial key points additional layer top these maps. In last using as input, refine that multiple long short-term...
Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, relation triplets subject-predicate-object lie on an extreme diversity space, such asperson-behind-person andcar-behind-building, while suffering problem combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual detection. To capture interactions...