- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Topic Modeling
- Generative Adversarial Networks and Image Synthesis
- Visual Attention and Saliency Detection
- Image Retrieval and Classification Techniques
- Anomaly Detection Techniques and Applications
- Face recognition and analysis
- Image Enhancement Techniques
- Natural Language Processing Techniques
- Adversarial Robustness in Machine Learning
- Face and Expression Recognition
- Image Processing Techniques and Applications
- Video Analysis and Summarization
- Text and Document Classification Technologies
- Image and Signal Denoising Methods
- COVID-19 diagnosis using AI
- Robotics and Sensor-Based Localization
- Advanced Image Fusion Techniques
Sun Yat-sen University
2016-2025
Xiamen University
2025
Chinese Academy of Agricultural Sciences
2014-2025
University of Turin
2025
Shanghai Sixth People's Hospital
2025
Fujian Medical University
2025
Guangdong University of Technology
2024
CRRC (China)
2024
Central Hospital of Wuhan
2024
Chinese People's Liberation Army
2024
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus proposed solutions and results. A new DIVerse 2K dataset (DIV2K) was employed. The had 6 competitions divided into 2 tracks 3 magnification factors each. Track 1 employed standard bicubic downscaling setup, while unknown operators (blur kernel decimation) but learnable through high res train images. Each competition ∽100 registered participants 20 teams...
Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries candidates. However, it is different from real-world scenarios where the annotations of bounding boxes are unavailable target needs to be searched a gallery whole scene images. To close gap, we propose new deep learning framework for search. Instead breaking down into two separate tasks-pedestrian detection re-identification, jointly handle both aspects in single...
The tradeoff between receptive field size and efficiency is a crucial issue in low level vision. Plain convolutional networks (CNNs) generally enlarge the at expense of computational cost. Recently, dilated filtering has been adopted to address this issue. But it suffers from gridding effect, resulting only sparse sampling input image with checkerboard patterns. In paper, we present novel multi-level wavelet CNN (MWCNN) model for better efficiency. With modified U-Net architecture, transform...
We present an algorithm for synthesizing textures from input sample. This patch-based sampling is fast and it makes high-quality texture synthesis a real-time process. For generating of the same size comparable quality, orders magnitude faster than existing algorithms. The works well wide variety ranging regular to stochastic. By patches according nonparametric estimation local conditional MRF density function, we avoid mismatching features across patch boundaries. also experimented with...
Recent successes in learning-based image classification, however, heavily rely on the large number of annotated training samples, which may require considerable human efforts. In this paper, we propose a novel active learning framework, is capable building competitive classifier with optimal feature representation via limited amount labeled instances an incremental manner. Our approach advances existing methods two aspects. First, incorporate deep convolutional neural networks into learning....
We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to (NAS) that trains neural operation parameters and architecture distribution in same round of back-propagation, while maintaining the completeness differentiability NAS pipeline. In this work, is reformulated as optimization problem on a joint for search space cell. To leverage gradient information generic differentiable loss search, novel proposed. prove optimizes objective...
We consider the single image super-resolution problem in a more general case that low-/high-resolution pairs and down-sampling process are unavailable. Different from traditional formulation, low-resolution input is further degraded by noises blurring. This complicated setting makes supervised learning accurate kernel estimation impossible. To solve this problem, we resort to unsupervised without paired data, inspired recent successful image-to-image translation applications. With generative...
Human parsing has recently attracted a lot of research interests due to its huge application potentials. However existing datasets have limited number images and annotations, lack the variety human appearances coverage challenging cases in unconstrained environment. In this paper, we introduce new benchmark Look into Person (LIP) that makes significant advance terms scalability, diversity difficulty, contribution feel is crucial for future developments human-centric analysis. This...
Domain adaptation enables the learner to safely generalize into novel environments by mitigating domain shifts across distributions. Previous works may not effectively uncover underlying reasons that would lead drastic model degradation on target task. In this paper, we empirically reveal erratic discrimination of mainly stems from its much smaller feature norms with respect source domain. To end, propose a parameter-free Adaptive Feature Norm approach. We demonstrate progressively adapting...
Resembling the rapid learning capability of human, low-shot empowers vision systems to understand new concepts by training with few samples. Leading approaches derived from meta-learning on images a single visual object. Obfuscated complex background and multiple objects in one image, they are hard promote research object detection/segmentation. In this work, we present aflexible general methodology achieve these tasks. Our work extends Faster /Mask R-CNN proposing over RoI...
Extracting informative image features and learning effective approximate hashing functions are two crucial steps in retrieval. Conventional methods often study these separately, e.g., hash from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output codes preset most previous methods, neglecting significance level different bits restricting their practical flexibility. To address issues, we propose supervised framework to generate compact bit-scalable directly raw...
Person re-identification has been usually solved as either the matching of single-image representation (SIR) or classification cross-image (CIR). In this work, we exploit connection between these two categories methods, and propose a joint learning frame-work to unify SIR CIR using convolutional neural network (CNN). Specifically, our deep architecture contains one shared sub-network together with sub-networks that extract SIRs given images CIRs image pairs, respectively. The is required be...
Unsupervised domain adaptation (UDA) conventionally assumes labeled source samples coming from a single underlying distribution. Whereas in practical scenario, data are typically collected diverse sources. The multiple sources different not only the target but also each other, thus, adaptater should be modeled same way. Moreover, those may completely share their categories, which further brings new transfer challenge called category shift. In this paper, we propose deep cocktail network...
To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since distribution of real-world is seriously unbalanced, existing methods perform quite poorly for less frequent relationships. In this work, we find that statistical correlations between object pairs their can effectively regularize semantic space make prediction ambiguous, thus well address unbalanced issue. achieve...
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus proposed solutions and results. The had 4 tracks. Track 1 employed standard bicubic downscaling setup, while Tracks 2, 3 realistic unknown downgrading operators simulating camera acquisition pipeline. were learnable through provided pairs high train images. tracks 145, 114, 101, 113 registered participants, resp., 31 teams competed final testing...
Human parsing and pose estimation have recently received considerable interest due to their substantial application potentials. However, the existing datasets limited numbers of images annotations lack a variety human appearances coverage challenging cases in unconstrained environments. In this paper, we introduce new benchmark named "Look into Person (LIP)" that provides significant advancement terms scalability, diversity, difficulty, which are crucial for future developments human-centric...
In this paper, we present an image parsing to text description (I2T) framework that generates descriptions of and video content based on understanding. The proposed I2T follows three steps: 1) input images (or frames) are decomposed into their constituent visual patterns by engine, in a spirit similar sentences natural language; 2) the results converted semantic representation form Web ontology language (OWL), which enables seamless integration with general knowledge bases; 3) generation...
This paper proposes a novel deep architecture to address multi-label image recognition, fundamental and practical task towards general visual understanding. Current solutions for this usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation sub-optimal performance. In work, we achieve the interpretable contextualized classification by developing recurrent memorized-attention module. module consists two alternately performed...
Integrating multiple different yet complementary feature representations has been proved to be an effective way for boosting tracking performance. This paper investigates how perform robust object in challenging scenarios by adaptively incorporating information from grayscale and thermal videos, proposes a novel collaborative algorithm online tracking. In particular, adaptive fusion scheme is proposed based on sparse representation Bayesian filtering framework. We jointly optimize codes the...