- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Visual Attention and Saliency Detection
- Advanced Vision and Imaging
- Remote-Sensing Image Classification
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- COVID-19 diagnosis using AI
- Image Enhancement Techniques
- Advanced Image Processing Techniques
- Image Processing Techniques and Applications
- Image and Signal Denoising Methods
- Advanced Image Fusion Techniques
- Gait Recognition and Analysis
- Machine Learning and ELM
- Face recognition and analysis
- Face and Expression Recognition
- Industrial Vision Systems and Defect Detection
- Hand Gesture Recognition Systems
- Adversarial Robustness in Machine Learning
University of Sheffield
2023-2025
Tsinghua University
2021-2025
University of Warwick
2019-2025
Aberystwyth University
2020-2024
Sichuan University
2024
Tencent (China)
2023
Anhui University of Technology
2022
Xidian University
2001-2021
Lancaster University
2017-2020
Beihang University
2018-2019
We present a simple but powerful architecture of convolutional neural network, which has VGG-like inference-time body composed nothing stack 3 × convolution and ReLU, while the training-time model multi-branch topology. Such decoupling is realized by structural re-parameterization technique so that named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, first time for plain model, to best our knowledge. NVIDIA 1080Ti GPU, models run 83% faster than ResNet-50 or 101% ResNet-101...
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances vision transformers (ViTs), this paper, we demonstrate that using a few kernels instead of stack small could be more powerful paradigm. suggested five guidelines, e.g., applying re-parameterized depthwise convolutions, to efficient high-performance large-kernel CNNs. Following the propose RepLKNet, pure CNN architecture whose size is as 31×31, contrast commonly used 3×3. RepLKNet...
As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, research community is soliciting architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an structure as building block, uses 1D asymmetric convolutions strengthen square...
Over the past years, YOLOs have emerged as predominant paradigm in field of real-time object detection owing to their effective balance between computational cost and performance. Researchers explored architectural designs, optimization objectives, data augmentation strategies, others for YOLOs, achieving notable progress. However, reliance on non-maximum suppression (NMS) post-processing hampers end-to-end deployment adversely impacts inference latency. Besides, design various components...
Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision language. Existing methods leverage attention mechanism to explore such in a fine-grained manner. However, most them consider all semantics equally thus align uniformly, regardless their diverse complexities. In fact, are (i.e. involving different kinds semantic concepts), humans usually follow latent structure combine into understandable languages. It may be difficult...
We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The is named Diverse Branch Block (DBB), which enhances representational capacity single convolution by combining diverse branches different scales and complexities enrich feature space, including sequences convolutions, multiscale average pooling. After training, DBB can be equivalently converted into conv layer for deployment. Unlike advancements...
Steerable properties dominate the design of traditional filters, e.g., Gabor and endow features capability dealing with spatial transformations. However, such excellent have not been well explored in popular deep convolutional neural networks (DCNNs). In this paper, we propose a new model, termed Convolutional Networks (GCNs or CNNs), which incorporates filters into DCNNs to enhance resistance learned orientation scale changes. By only manipulating basic element based on i.e., convolution...
RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead improving RGB based saliency detection, this paper takes advantage complementary benefits thermal infrared images. Specifically, we propose a...
The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove some unimportant filters from convolutional layers so as slim the network with acceptable performance drop. Inspired by linearity of convolution, we seek make increasingly close and eventually identical for slimming. To this end, propose Centripetal SGD (C-SGD), a novel optimization method, can train several collapse into single point parameter hyperspace. When training completed, removal...
For efficiently retrieving nearest neighbors from large-scale multiview data, recently hashing methods are widely investigated, which can substantially improve query speeds. In this paper, we propose an effective probability-based semantics-preserving (SePH) method to tackle the problem of cross-view retrieval. Considering semantic consistency between views, SePH generates one unified hash code for all observed views any instance. training, first transforms given affinities training data...
We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the system requires to recognize unseen classes given only corresponding class semantics. During training, model is trained within collection of episodes, each which designed simulate classification task. Through multiple progressively accumulates ensemble experiences on predicting mimetic classes, will generalize well real classes. Based this framework, we propose novel generative that...
For Visible-Infrared person ReIDentification (VI-ReID), existing modality-specific information compensation based models try to generate the images of missing modality from ones for reducing cross-modality discrepancy. However, because large discrepancy between visible and infrared images, generated usually have low qualities introduce much more interfering (e.g., color inconsistency). This greatly degrades subsequent VI-ReID performance. Alternatively, we present a novel Feature-level...
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down CNN by reducing the width (number of output channels) convolutional layers. Inspired neurobiology research about independence remembering and forgetting, we to re-parameterize into parts forgetting parts, where former learn maintain performance latter prune. Via training with regular SGD on but update rule penalty gradients latter, realize structured sparsity. Then equivalently merge...
Recent years have witnessed a big leap in automatic visual saliency detection attributed to advances deep learning, especially Convolutional Neural Networks (CNNs). However, inferring the of each image part separately, as was adopted by most CNNs methods, inevitably leads an incomplete segmentation salient object. In this paper, we describe how use property part-object relations endowed Capsule Network (CapsNet) solve problems that fundamentally hinge on relational inference for detection....
Skeleton-based action recognition has been extensively studied, but it remains an unsolved problem because of the complex variations skeleton joints in 3-D spatiotemporal space. To handle this issue, we propose a newly temporal-then-spatial recalibration method named memory attention networks (MANs) and deploy MANs using temporal module (TARM) convolution (STCM). In TARM, novel mechanism is built based on residual learning to recalibrate frames data temporally. STCM, recalibrated sequence...
Semantic segmentation models gain robustness against poor lighting conditions by virtue of complementary information from visible (RGB) and thermal images. Despite its importance, most existing RGB-T semantic perform primitive fusion strategies, such as concatenation, element-wise summation weighted summation, to fuse features different modalities. These unfortunately, overlook the modality differences due imaging mechanisms, so that they suffer reduced discriminability fused features. To...
Dense captioning provides detailed captions of complex visual scenes. While a number successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, forget gate mechanism LSTM makes it vulnerable when dealing with sequence and 2) vast majority prior arts consider regions interests (RoIs) equally important,...
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...
Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due complex capsule routing algorithms. In this article, we present simple yet effective algorithm, presented by residual pose routing. Specifically, higher-layer achieved an identity mapping on adjacently lower-layer pose. Such has two advantages: 1) reducing computation complexity and 2) avoiding gradient vanishing its framework. On top...
This is a repository copy of Virtual category learning: semi-supervised learning method for dense prediction with extremely limited labels.