- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Image Enhancement Techniques
- Advanced Vision and Imaging
- Anomaly Detection Techniques and Applications
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Infrastructure Maintenance and Monitoring
- Automated Road and Building Extraction
- Advanced Image Processing Techniques
- Hand Gesture Recognition Systems
- Adversarial Robustness in Machine Learning
- Remote Sensing and LiDAR Applications
- Robotics and Sensor-Based Localization
- Advanced Data Storage Technologies
- Industrial Vision Systems and Defect Detection
- Advanced Image Fusion Techniques
- Visual Attention and Saliency Detection
- Generative Adversarial Networks and Image Synthesis
- Face recognition and analysis
- Neural Networks and Applications
University of Macau
2024
National University of Defense Technology
2024
Shandong University of Science and Technology
2024
Xi'an Jiaotong University
2022-2023
Megvii (China)
2017-2022
University of Tennessee Health Science Center
2022
German Center for Neurodegenerative Diseases
2022
Vi Technology (United States)
2019-2022
Northwestern Polytechnical University
1998-2022
Fudan University
2021
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these networks, exposing computation as a bottleneck. In this work, we introduce Region Proposal Network(RPN) that shares full-image convolutional features with network, thus enabling nearly cost-free proposals. An RPN is fully network simultaneously predicts bounds objectness scores at each position. The...
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier networks image classification from two aspects. First, propose a Parametric Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, derive robust initialization method particularly considers nonlinearities. This enables us to train extremely deep...
In this paper, we propose a simple but effective image prior-dark channel prior to remove haze from single input image. The dark is kind of statistics outdoor haze-free images. It based on key observation-most local patches in images contain some pixels whose intensity very low at least one color channel. Using with the imaging model, can directly estimate thickness and recover high-quality Results variety hazy demonstrate power proposed prior. Moreover, depth map also be obtained as...
Deeper neural networks are more difficult to train. We present a residual learning framework ease the training of that substantially deeper than those used previously. explicitly reformulate layers as functions with reference layer inputs, instead unreferenced functions. provide comprehensive empirical evidence showing these easier optimize, and can gain accuracy from considerably increased depth. On ImageNet dataset we evaluate nets depth up 152 layers---8x VGG but still having lower...
In this paper, we propose a simple but effective image prior - dark channel to remove haze from single input image. The is kind of statistics the haze-free outdoor images. It based on key observation most local patches in images contain some pixels which have very low intensities at least one color channel. Using with imaging model, can directly estimate thickness and recover high quality Results variety demonstrate power proposed prior. Moreover, depth map also be obtained as by-product removal.
One of recent trends [31, 32, 14] in network architecture design is stacking small filters (e.g., 1×1 or 3×3) the entire because stacked more efficient than a large kernel, given same computational complexity. However, field semantic segmentation, where we need to perform dense per-pixel prediction, find that kernel (and effective receptive field) plays an important role when have classification and localization tasks simultaneously. Following our principle, propose Global Convolutional...
The topic of multi-person pose estimation has been largely improved recently, especially with the development convolutional neural network. However, there still exist a lot challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present novel network structure called Cascaded Pyramid Network (CPN) targets to relieve problem from these "hard" keypoints. More specifically, our algorithm includes two stages:...
Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, models are getting more complex and time-consuming. For real-world applications in industrial commercial scenarios, engineers developers often faced with requirement of constrained time budget. In this paper, we investigate accuracy CNNs under cost. Under constraint, designs network architectures should exhibit as trade-offs among factors like depth, numbers filters, filter sizes,...
Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, designed share their convolutional features. We develop an algorithm the nontrivial end-to-end...
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier networks image classification from two aspects. First, propose a Parametric Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, derive robust initialization method particularly considers nonlinearities. This enables us to train extremely deep...
Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks a tedious and inefficient procedure. We note that the topic interactive image segmentation, scribbles are very widely used in academic research commercial software, recognized as one most userfriendly ways interacting. In this paper, we propose to use annotate images, develop an algorithm train convolutional networks supervised by scribbles. Our based on graphical model jointly...
Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of that usually benefit from more training data. In this paper, we propose a method achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is iterate between automatically generating region proposals networks. These...
In this paper, we formulate the problem of natural image matting as one solving Poisson equations with matte gradient field. Our approach, which call , has following advantages. First, is directly reconstructed from a continuous field by using boundary information user-supplied trimap. Second, interactively manipulating number filtering tools, user can further improve results locally until he or she satisfied. The modified local result seamlessly integrated into final result. Experiments on...
The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs) [13]. current leading approaches for exploit shape information extracting CNN from masked image regions. This strategy introduces artificial boundaries on images and may impact quality extracted features. Besides, operations raw domain require compute thousands a single image, which is time-consuming. In this paper, we propose via masking...
We extends the previous 2D cascaded object pose regression work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is pose-indexed features generalize parameterized and achieve invariance to transformations. second a principled hierarchical adapted structure. It therefore more accurate faster. Comprehensive experiments verify state-of-the-art accuracy efficiency of proposed approach on challenging hand estimation problem, public dataset our new dataset.
The development of object detection in the era deep learning, from R-CNN [11], Fast/Faster [10, 31] to recent Mask [14] and RetinaNet [24], mainly come novel network, new framework, or loss design. However, mini-batch size, a key factor for training neural networks, has not been well studied detection. In this paper, we propose Large Mini-Batch Object Detector (MegDet) enable with large size up 256, so that can effectively utilize at most 128 GPUs significantly shorten time. Technically,...
Alpha matting refers to the problem of softly extracting foreground from an image. Given a trimap (specifying known foreground/background and unknown pixels), straightforward way compute alpha value is sample some background colors for each pixel. Existing sampling-based methods often collect samples near pixels only. They fail if good cannot be found nearby. In this paper, we propose global sampling method that uses all available in Our set avoids missing samples. A simple but effective...
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or responses, our method takes nonlinear units into account. We minimize reconstruction error subject a low-rank constraint which helps reduce complexity filters. develop an effective solution this constrained optimization problem. An algorithm is also presented reducing accumulated when multiple layers approximated....
In this paper, we first investigate why typical two-stage methods are not as fast single-stage, detectors like YOLO and SSD. We find that Faster R-CNN R-FCN perform an intensive computation after or before RoI warping. involves two fully connected layers for recognition, while produces a large score maps. Thus, the speed of these networks is slow due to heavy-head design in architecture. Even if significantly reduce base model, cost cannot be largely decreased accordingly. propose new...
Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors Faster R-CNN, R-FCN FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically detection. More importantly, there several differences between tasks of classification 1. RetinaNet involve extra stages against task handle objects with various...
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind building blocks, which suggest that forward backward signals can be directly propagated from one block to any other block, when using identity mappings skip connections after-addition activation. A series ablation experiments support importance these mappings. This motivates us propose new...