- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Advanced Image Processing Techniques
- Human Pose and Action Recognition
- Adversarial Robustness in Machine Learning
- Anomaly Detection Techniques and Applications
- Image Processing Techniques and Applications
- Visual Attention and Saliency Detection
- Image Enhancement Techniques
- COVID-19 diagnosis using AI
- Advanced Computational Techniques and Applications
- Image and Object Detection Techniques
- Remote-Sensing Image Classification
- Machine Learning and Data Classification
- Advanced Algorithms and Applications
- Generative Adversarial Networks and Image Synthesis
- Ocular Diseases and Behçet’s Syndrome
- Glaucoma and retinal disorders
- Face and Expression Recognition
- Retinal Imaging and Analysis
Beijing Institute of Technology
2008-2025
Southeast University
2025
Chongqing University of Technology
2022-2025
National Institute of Allergy and Infectious Diseases
2023-2024
National Institutes of Health
2020-2024
Shenyang Ligong University
2020-2024
Vi Technology (United States)
2019-2023
Megvii (China)
2017-2023
Jilin Electric Power Research Institute (China)
2022
Tongji University
2022
Deeper neural networks are more difficult to train. We present a residual learning framework ease the training of that substantially deeper than those used previously. explicitly reformulate layers as functions with reference layer inputs, instead unreferenced functions. provide comprehensive empirical evidence showing these easier optimize, and can gain accuracy from considerably increased depth. On ImageNet dataset we evaluate nets depth up 152 - 8× VGG [40] but still having lower...
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new utilizes two operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error...
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the an anchor-free manner and conduct other advanced detection techniques, i.e., decoupled head leading label assignment strategy SimOTA achieve state-of-the-art results across large scale range of models: For YOLO-Nano with only 0.91M parameters 1.08G FLOPs, get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one most widely used detectors in...
We present a simple but powerful architecture of convolutional neural network, which has VGG-like inference-time body composed nothing stack 3 × convolution and ReLU, while the training-time model multi-branch topology. Such decoupling is realized by structural re-parameterization technique so that named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, first time for plain model, to best our knowledge. NVIDIA 1080Ti GPU, models run 83% faster than ResNet-50 or 101% ResNet-101...
This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution optimization problem in object detection rather than multi-scale fusion. From perspective optimization, we introduce an alternative way address instead adopting complex - utilizing only one-level detection. Based on simple efficient solution, present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated...
This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network sub-stage cascade respectively. Based on the multi-scale feature propagation, substantially reduces number of parameters, but still obtains sufficient receptive field enhances model learning ability, which strikes balance between speed...
Detecting individual pedestrians in a crowd remains challenging problem since the often gather together and occlude each other real-world scenarios. In this paper, we first explore how state-of-the-art pedestrian detector is harmed by occlusion via experimentation, providing insights into problem. Then, propose novel bounding box regression loss specifically designed for scenes, termed repulsion loss. This driven two motivations: attraction target, surrounding objects. The term prevents...
In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train PruningNet, kind network, which is able to generate weight parameters any pruned structure given the target network. use simple stochastic sampling method training PruningNet. Then, apply an evolutionary procedure search good-performing The highly efficient because weights are directly generated by trained PruningNet and do not need finetuning at time. With...
Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented current benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors scenarios. The CrowdHuman dataset large, rich-annotated and contains high diversity. There total $470K$ instances train validation subsets, $~22.6$ persons...
The problem of quantizing the activations a deep neural network is considered. An examination popular binary quantization approach shows that this consists approximating classical non-linearity, hyperbolic tangent, by two functions: piecewise constant sign function, which used in feedforward computations, and linear hard tanh backpropagation step during learning. widely ReLU non-linearity then half-wave Gaussian quantizer (HWGQ) proposed for forward approximation shown to have efficient...
In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through three-step, carefully designed annotation pipeline. It is the largest dataset (with full annotation) so far and establishes more challenging benchmark for community. Objects365 can serve as better feature learning localization-sensitive tasks like semantic segmentation. The...
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new utilizes two operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error...
Recent research on super-resolution has achieved great success due to the development of deep convolutional neural networks (DCNNs). However, arbitrary scale factor been ignored for a long time. Most previous researchers regard differentscale factors as independent tasks. They train specific model each which is inefficient in computing, and prior work only take several integer into consideration. In this work,we propose novel method called Meta-SR firstly solve (including non-integer...
Occluded person re-identification (ReID) aims to match occluded images holistic ones across dis-joint cameras. In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features robust alignment. At first, use CNN backbone learn feature maps key-points estimation model extract semantic local features. Even so, still suffer from occlusion outliers. Then, view the extracted of an image as nodes graph adaptive direction convolutional...
In state-of-the-art image retrieval systems, an is represented by a bag of visual words obtained quantizing high-dimensional local descriptors, and scalable schemes inspired text are then applied for large scale indexing retrieval. Bag-of-words representations, however: 1) reduce the discriminative power features due to feature quantization; 2) ignore geometric relationships among words. Exploiting such constraints, estimating 2D affine transformation between query each candidate image, has...
This paper considers a realistic problem in person re-identification (re-ID) task, i.e., partial re-ID. Under re-ID scenario, the images may contain observation of pedestrian. If we directly compare pedestrian image with holistic one, extreme spatial misalignment significantly compromises discriminative ability learned representation. We propose Visibility-aware Part Model (VPM) for re-ID, which learns to perceive visibility regions through self-supervision. The awareness allows VPM extract...
Recent advances in label assignment object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object. In this paper, we innovatively revisit the from a global perspective and propose formulate assigning procedure as an Optimal Transport (OT) problem – well-studied topic Optimization Theory. Concretely, unit transportation cost between demander (anchor) supplier pair weighted summation of their classification regression losses. After...
In this paper, we propose a novel query design for the transformer-based object detection. previous detectors, queries are set of learned embeddings. However, each embedding does not have an explicit physical meaning and cannot explain where it will focus on. It is difficult to optimize as prediction slot specific mode. other words, on region. To solve these problems, in our design, based anchor points, which widely used CNN-based detectors. So focuses objects near point. Moreover, can...