Jungong Han

ORCID: 0000-0003-4361-956X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Visual Attention and Saliency Detection
  • Advanced Vision and Imaging
  • Remote-Sensing Image Classification
  • Video Analysis and Summarization
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • COVID-19 diagnosis using AI
  • Image Enhancement Techniques
  • Advanced Image Processing Techniques
  • Image Processing Techniques and Applications
  • Image and Signal Denoising Methods
  • Advanced Image Fusion Techniques
  • Gait Recognition and Analysis
  • Machine Learning and ELM
  • Face recognition and analysis
  • Face and Expression Recognition
  • Industrial Vision Systems and Defect Detection
  • Hand Gesture Recognition Systems
  • Adversarial Robustness in Machine Learning

University of Sheffield
2023-2025

Tsinghua University
2021-2025

University of Warwick
2019-2025

Aberystwyth University
2020-2024

Sichuan University
2024

Tencent (China)
2023

Anhui University of Technology
2022

Xidian University
2001-2021

Lancaster University
2017-2020

Beihang University
2018-2019

We present a simple but powerful architecture of convolutional neural network, which has VGG-like inference-time body composed nothing stack 3 × convolution and ReLU, while the training-time model multi-branch topology. Such decoupling is realized by structural re-parameterization technique so that named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, first time for plain model, to best our knowledge. NVIDIA 1080Ti GPU, models run 83% faster than ResNet-50 or 101% ResNet-101...

10.1109/cvpr46437.2021.01352 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances vision transformers (ViTs), this paper, we demonstrate that using a few kernels instead of stack small could be more powerful paradigm. suggested five guidelines, e.g., applying re-parameterized depthwise convolutions, to efficient high-performance large-kernel CNNs. Following the propose RepLKNet, pure CNN architecture whose size is as 31×31, contrast commonly used 3×3. RepLKNet...

10.1109/cvpr52688.2022.01166 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, research community is soliciting architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an structure as building block, uses 1D asymmetric convolutions strengthen square...

10.1109/iccv.2019.00200 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Over the past years, YOLOs have emerged as predominant paradigm in field of real-time object detection owing to their effective balance between computational cost and performance. Researchers explored architectural designs, optimization objectives, data augmentation strategies, others for YOLOs, achieving notable progress. However, reliance on non-maximum suppression (NMS) post-processing hampers end-to-end deployment adversely impacts inference latency. Besides, design various components...

10.48550/arxiv.2405.14458 preprint EN arXiv (Cornell University) 2024-05-23

Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision language. Existing methods leverage attention mechanism to explore such in a fine-grained manner. However, most them consider all semantics equally thus align uniformly, regardless their diverse complexities. In fact, are (i.e. involving different kinds semantic concepts), humans usually follow latent structure combine into understandable languages. It may be difficult...

10.1109/cvpr42600.2020.01267 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The is named Diverse Branch Block (DBB), which enhances representational capacity single convolution by combining diverse branches different scales and complexities enrich feature space, including sequences convolutions, multiscale average pooling. After training, DBB can be equivalently converted into conv layer for deployment. Unlike advancements...

10.1109/cvpr46437.2021.01074 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Steerable properties dominate the design of traditional filters, e.g., Gabor and endow features capability dealing with spatial transformations. However, such excellent have not been well explored in popular deep convolutional neural networks (DCNNs). In this paper, we propose a new model, termed Convolutional Networks (GCNs or CNNs), which incorporates filters into DCNNs to enhance resistance learned orientation scale changes. By only manipulating basic element based on i.e., convolution...

10.1109/tip.2018.2835143 article EN IEEE Transactions on Image Processing 2018-05-10

RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead improving RGB based saliency detection, this paper takes advantage complementary benefits thermal infrared images. Specifically, we propose a...

10.1109/tip.2019.2959253 article EN IEEE Transactions on Image Processing 2019-12-17

The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove some unimportant filters from convolutional layers so as slim the network with acceptable performance drop. Inspired by linearity of convolution, we seek make increasingly close and eventually identical for slimming. To this end, propose Centripetal SGD (C-SGD), a novel optimization method, can train several collapse into single point parameter hyperspace. When training completed, removal...

10.1109/cvpr.2019.00508 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

For efficiently retrieving nearest neighbors from large-scale multiview data, recently hashing methods are widely investigated, which can substantially improve query speeds. In this paper, we propose an effective probability-based semantics-preserving (SePH) method to tackle the problem of cross-view retrieval. Considering semantic consistency between views, SePH generates one unified hash code for all observed views any instance. training, first transforms given affinities training data...

10.1109/tcyb.2016.2608906 article EN IEEE Transactions on Cybernetics 2016-09-29

We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the system requires to recognize unseen classes given only corresponding class semantics. During training, model is trained within collection of episodes, each which designed simulate classification task. Through multiple progressively accumulates ensemble experiences on predicting mimetic classes, will generalize well real classes. Based this framework, we propose novel generative that...

10.1109/cvpr42600.2020.01405 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

For Visible-Infrared person ReIDentification (VI-ReID), existing modality-specific information compensation based models try to generate the images of missing modality from ones for reducing cross-modality discrepancy. However, because large discrepancy between visible and infrared images, generated usually have low qualities introduce much more interfering (e.g., color inconsistency). This greatly degrades subsequent VI-ReID performance. Alternatively, we present a novel Feature-level...

10.1109/cvpr52688.2022.00720 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down CNN by reducing the width (number of output channels) convolutional layers. Inspired neurobiology research about independence remembering and forgetting, we to re-parameterize into parts forgetting parts, where former learn maintain performance latter prune. Via training with regular SGD on but update rule penalty gradients latter, realize structured sparsity. Then equivalently merge...

10.1109/iccv48922.2021.00447 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Recent years have witnessed a big leap in automatic visual saliency detection attributed to advances deep learning, especially Convolutional Neural Networks (CNNs). However, inferring the of each image part separately, as was adopted by most CNNs methods, inevitably leads an incomplete segmentation salient object. In this paper, we describe how use property part-object relations endowed Capsule Network (CapsNet) solve problems that fundamentally hinge on relational inference for detection....

10.1109/tpami.2021.3053577 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

10.1109/cvpr52733.2024.01506 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Skeleton-based action recognition has been extensively studied, but it remains an unsolved problem because of the complex variations skeleton joints in 3-D spatiotemporal space. To handle this issue, we propose a newly temporal-then-spatial recalibration method named memory attention networks (MANs) and deploy MANs using temporal module (TARM) convolution (STCM). In TARM, novel mechanism is built based on residual learning to recalibrate frames data temporally. STCM, recalibrated sequence...

10.1109/tnnls.2021.3061115 article EN IEEE Transactions on Neural Networks and Learning Systems 2021-03-15

Semantic segmentation models gain robustness against poor lighting conditions by virtue of complementary information from visible (RGB) and thermal images. Despite its importance, most existing RGB-T semantic perform primitive fusion strategies, such as concatenation, element-wise summation weighted summation, to fuse features different modalities. These unfortunately, overlook the modality differences due imaging mechanisms, so that they suffer reduced discriminability fused features. To...

10.1109/cvpr46437.2021.00266 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Dense captioning provides detailed captions of complex visual scenes. While a number successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, forget gate mechanism LSTM makes it vulnerable when dealing with sequence and 2) vast majority prior arts consider regions interests (RoIs) equally important,...

10.1109/tnnls.2022.3152990 article EN publisher-specific-oa IEEE Transactions on Neural Networks and Learning Systems 2022-03-11

Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...

10.1109/tmm.2023.3241517 article EN IEEE Transactions on Multimedia 2023-01-01

Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due complex capsule routing algorithms. In this article, we present simple yet effective algorithm, presented by residual pose routing. Specifically, higher-layer achieved an identity mapping on adjacently lower-layer pose. Such has two advantages: 1) reducing computation complexity and 2) avoiding gradient vanishing its framework. On top...

10.1109/tnnls.2023.3347722 article EN IEEE Transactions on Neural Networks and Learning Systems 2024-01-09

This is a repository copy of Virtual category learning: semi-supervised learning method for dense prediction with extremely limited labels.

10.1109/tpami.2024.3367416 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-20
Coming Soon ...