- Human Pose and Action Recognition
- Hand Gesture Recognition Systems
- Robot Manipulation and Learning
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Optical measurement and interference techniques
- 3D Shape Modeling and Analysis
- Video Surveillance and Tracking Methods
- Gait Recognition and Analysis
- Video Analysis and Summarization
- Human Motion and Animation
Nanyang Technological University
2016-2021
Despite great progress in 3D pose estimation from single-view images or videos, it remains a challenging task due to the substantial depth ambiguity and severe self-occlusions. Motivated by effectiveness of incorporating spatial dependencies temporal consistencies alleviate these issues, we propose novel graph-based method tackle problem human body hand short sequence 2D joint detections. Particularly, domain knowledge about (body) configurations is explicitly incorporated into graph...
This work addresses a novel and challenging problem of estimating the full 3D hand shape pose from single RGB image. Most current methods in analysis monocular images only focus on locations keypoints, which cannot fully express hand. In contrast, we propose Graph Convolutional Neural Network (Graph CNN) based method to reconstruct mesh surface that contains richer information both pose. To train networks with supervision, create large-scale synthetic dataset containing ground truth meshes...
We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs). Image based features extracted by 2D CNNs are not directly suitable 3D due to the lack of spatial information. Our proposed CNN taking volumetric representation image as input can capture structure and accurately regress full in pass. In order make robust variations sizes global orientations, we perform data augmentation on...
Convolutional Neural Network (CNN) has shown promising results for 3D hand pose estimation in depth images. Different from existing CNN-based methods that take either 2D images or volumes as the input, our proposed Hand PointNet directly processes point cloud models visible surface of regression. Taking normalized regression network is able to capture complex structures and accurately regress a low dimensional representation pose. In order further improve accuracy fingertips, we design...
Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, accuracy of existing methods is still not satisfactory, partially due to difficulty embedded high-dimensional and non-linear regression problem. Different from discriminative that regress for with a single depth image, we propose first project query image onto three orthogonal planes utilize these multi-view projections 2D heat-maps which estimate joint positions on each...
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are next challenges that need be tackled? Following successful Hands Million Challenge (HIM2017), investigate top 10 state-of-the-art methods on three tasks: single frame estimation, tracking, and during object interaction. We analyze performance different CNN structures with regard shape, joint visibility, view point articulation distributions. Our findings...
In this paper, we present a novel method for real-time 3D hand pose estimation from single depth images using Convolutional Neural Networks (CNNs). Image-based features extracted by 2D CNNs are not directly suitable due to the lack of spatial information. Our proposed CNN-based method, taking volumetric representation image as input and extracting input, can capture structure accurately regress full in pass. order make CNN robust variations sizes global orientations, perform data...
3D hand pose estimation has made significant progress recently, where Convolutional Neural Networks (CNNs) play a critical role. However, most of the existing CNN-based methods depend much on training set, while labeling data is laborious and time-consuming. Inspired by point cloud autoencoder presented in self-organizing network (SO-Net), our proposed SO-HandNet aims at making use unannotated to obtain accurate semi-supervised manner. We exploit feature encoder (HFE) extract multi-level...
Articulated hand pose estimation is one of core technologies in human-computer interaction. Despite the recent progress, most existing methods still cannot achieve satisfactory performance, partly due to difficulty embedded high-dimensional nonlinear regression problem. Most data-driven directly regress 3D from 2D depth image, which fully utilize information. In this paper, we propose a novel multi-view convolutional neural network (CNN)-based approach for estimation. To better exploit...
Compared with depth-based 3D hand pose estimation, it is more challenging to infer from monocular RGB images, due the substantial depth ambiguity and difficulty of obtaining fully-annotated training data. Different existing learning-based RGB-input approaches that require accurate annotations for training, we propose leverage images can be easily obtained commodity RGB-D cameras during while testing take only inputs joint predictions. In this way, alleviate burden costly in real-world...
Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, accuracy of existing methods is still not satisfactory, partially due to difficulty embedded high-dimensional and non-linear regression problem. Different from discriminative that regress for with a single depth image, we propose first project query image onto three orthogonal planes utilize these multi-view projections 2D heat-maps which estimate joint positions on each...
Vision-based hand pose estimation is important in human-computer interaction. While many recent works focus on full degree-of-freedom estimation, robust of global remains a challenging problem. This paper presents novel algorithm to optimize the leaf weights Hough forest assist with single depth camera. Different from traditional forest, we propose learn vote stored at nodes principled way minimize average prediction error, so that ambiguous votes are largely suppressed during fusion....
This work addresses a novel and challenging problem of estimating the full 3D hand shape pose from single RGB image. Most current methods in analysis monocular images only focus on locations keypoints, which cannot fully express hand. In contrast, we propose Graph Convolutional Neural Network (Graph CNN) based method to reconstruct mesh surface that contains richer information both pose. To train networks with supervision, create large-scale synthetic dataset containing ground truth meshes...
This work proposes an end-to-end approach to estimate full 3D hand pose from stereo cameras. Most existing methods of estimating cameras apply matching obtain depth map and use depth-based solution pose. In contrast, we propose bypass the directly image pairs. The proposed neural network architecture extends any keypoint predictor sparse disparity joints. order effectively train model, a large scale synthetic dataset that is composed pairs ground truth annotations. Experiments show...
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are next challenges that need be tackled? Following successful Hands Million Challenge (HIM2017), investigate top 10 state-of-the-art methods on three tasks: single frame estimation, tracking, and during object interaction. We analyze performance different CNN structures with regard shape, joint visibility, view point articulation distributions. Our findings...