- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Face recognition and analysis
- Advanced Vision and Imaging
- Domain Adaptation and Few-Shot Learning
- Hand Gesture Recognition Systems
- Visual Attention and Saliency Detection
- Face and Expression Recognition
- Anomaly Detection Techniques and Applications
- Autonomous Vehicle Technology and Safety
- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Image Enhancement Techniques
- Forensic Anthropology and Bioarchaeology Studies
- Internet Traffic Analysis and Secure E-voting
- Robot Manipulation and Learning
- Numerical methods for differential equations
- Network Security and Intrusion Detection
- Advanced Numerical Methods in Computational Mathematics
- Robotics and Sensor-Based Localization
- Differential Equations and Numerical Methods
- Industrial Vision Systems and Defect Detection
- Generative Adversarial Networks and Image Synthesis
South China Agricultural University
2024
Shanghai Jiao Tong University
2023-2024
Tianjin University
2024
Southeast University
2023
Megvii (China)
2019-2022
Vi Technology (United States)
2019-2021
University of Hong Kong
2004-2021
Shanghai Normal University
2019-2021
Fudan University
2020
Microsoft Research Asia (China)
2010-2018
Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in their building modules. In this work, we introduce two new modules enhance transformation modeling capability of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace plain counterparts existing CNNs be easily trained...
Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence the idea working in deep learning era. All state-of-the-art detection systems still rely on recognizing instances individually, without exploiting their during learning. This work proposes an relation module. It processes a set of simultaneously through interaction appearance feature and geometry, thus allowing relations. lightweight in-place. does require...
Recent progresses in salient object detection have exploited the boundary prior, or background information, to assist other saliency cues such as contrast, achieving state-of-the-art results. However, their usage of prior is very simple, fragile, and integration with mostly heuristic. In this work, we present new methods address these issues. First, propose a robust measure, called connectivity. It characterizes spatial layout image regions respect boundaries much more robust. has an...
This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class $s_p$ and minimize between-class $s_n$. We find majority of loss functions, including triplet softmax cross-entropy loss, embed $s_n$ into pairs seek reduce $(s_n-s_p)$. Such an manner is inflexible, because penalty strength every single score restricted be equal. Our intuition that if deviates far from optimum, it should emphasized. To this end, we simply re-weight each...
Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art videos as per-frame evaluation too slow and unaffordable. We present deep feature flow, a fast accurate framework for video recognition. It runs expensive sub-network only sparse key frames propagates their maps other via flow field. achieves significant speedup computation relatively fast. The end-to-end training of whole architecture...
Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers degenerated appearances in videos, e.g., motion blur, defocus, rare poses, etc. Existing work attempts exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for detection. It leverages coherence level instead. improves the per-frame features by aggregation...
In this paper, we study the task of 3D human pose estimation in wild. This is challenging due to lack training data, as existing datasets are either wild images with 2D or lab pose.,, We propose a weakly-supervised transfer learning method that uses mixed and labels unified deep neutral network presents two-stage cascaded structure. Our augments state-of-the-art sub-network depth regression sub-network. Unlike previous two stage approaches train sub-networks sequentially separately, our...
Regression based methods are not performing as well detection for human pose estimation. A central problem is that the structural information in exploited previous regression methods. In this work, we propose a structure-aware approach. It adopts reparameterized representation using bones instead of joints. exploits joint connection structure to define compositional loss function encodes long range interactions pose. simple, effective, and general both 2D 3D estimation unified setting....
We extends the previous 2D cascaded object pose regression work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is pose-indexed features generalize parameterized and achieve invariance to transformations. second a principled hierarchical adapted structure. It therefore more accurate faster. Comprehensive experiments verify state-of-the-art accuracy efficiency of proposed approach on challenging hand estimation problem, public dataset our new dataset.
We present a very efficient, highly accurate, "Explicit Shape Regression" approach for face alignment. Unlike previous regression-based approaches, we directly learn vectorial regression function to infer the whole facial shape (a set of landmarks) from image and explicitly minimize alignment errors over training data. The inherent constraint is naturally encoded into regressor in cascaded learning framework applied coarse fine during test, without using fixed parametric model as most...
Modeling data uncertainty is important for noisy images, but seldom explored face recognition. The pioneer work, PFE, considers by modeling each image embedding as a Gaussian distribution. It quite effective. However, it uses fixed feature (mean of the Gaussian) from an existing model. only estimates variance and relies on ad-hoc costly metric. Thus, not easy to use. unclear how affects learning. This work applies learning recognition, such that (mean) (variance) are learnt simultaneously,...
We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing method, however, is hard to train not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model address challenge in training. Our central idea construct simplified supernet, where all architectures are single paths so that weight co-adaption problem alleviated. Training performed by uniform path sampling. All (and their...
This paper considers vehicle re-identification (re-ID) problem. The extreme viewpoint variation (up to 180 degrees) poses great challenges for existing approaches. Inspired by the behavior in human's recognition process, we propose a novel viewpoint-aware metric learning approach. It learns two metrics similar viewpoints and different feature spaces, respectively, giving rise network (VANet). During training, types of constraints are applied jointly. inference, is firstly estimated...
Previous learning based hand pose estimation methods does not fully exploit the prior information in model geometry. Instead, they usually rely a separate fitting step to generate valid poses. Such post processing is inconvenient and sub-optimal. In this work, we propose deep approach that adopts forward kinematics layer ensure geometric validity of estimated For first time, show embedding such non-linear generative process feasible for estimation. Our verified on challenging public datasets...
Random forest is well known as one of the best learning methods. In spite its great success, it also has certain drawbacks: heuristic rule does not effectively minimize global training loss; model size usually too large for many real applications. To address issues, we propose two techniques, refinement and pruning, to improve a pre-trained random forest. The proposed jointly relearns leaf nodes all trees under objective function so that complementary information between multiple exploited....