- Human Pose and Action Recognition
- Hand Gesture Recognition Systems
- Robot Manipulation and Learning
- Anomaly Detection Techniques and Applications
- Human Motion and Animation
- Tactile and Sensory Interactions
- Muscle activation and electromyography studies
- Gait Recognition and Analysis
- Gaze Tracking and Assistive Technology
- 3D Shape Modeling and Analysis
- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- Video Analysis and Summarization
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Stroke Rehabilitation and Recovery
- Ergonomics and Musculoskeletal Disorders
- Aesthetic Perception and Analysis
- Multimodal Machine Learning Applications
- Medical Image Segmentation Techniques
- Color Science and Applications
META Health
2022-2024
Meta (Israel)
2020
Meta (United States)
2019-2020
ETH Zurich
2016-2019
Beijing University of Posts and Telecommunications
2013
In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as high-dimensional trajectory on Lie group with an additional dynamic time warping, and then shallowly learn favorable features. this paper we incorporate the structure into deep network architecture to more appropriate features for recognition. Within structure, design rotation mapping layers transform input desirable...
We present a simple and effective method for 3D hand pose estimation from single depth frame. As opposed to previous state-of-the-art methods based on holistic regression, our works dense pixel-wise estimation. This is achieved by careful design choices in parameterization, which leverages both 2D properties of map. Specifically, we decompose the parameters into set per-pixel estimations, i.e., heat maps, maps unit directional vector fields. The 2D/3D joint offsets are estimated via...
We present a system for real-time hand-tracking to drive virtual and augmented reality (VR/AR) experiences. Using four fisheye monochrome cameras, our generates accurate low-jitter 3D hand motion across large working volume diverse set of users. achieve this by proposing neural network architectures detecting hands estimating keypoint locations. Our detection robustly handles variety real world environments. The estimation leverages tracking history produce spatially temporally consistent...
State-of-the-art methods for 3D hand pose estimation from depth images require large amounts of annotated training data. We propose modelling the statistical relationship poses and corresponding using two deep generative models with a shared latent space. By design, our architecture allows learning unlabeled image data in semi-supervised manner. Assuming one-to-one mapping between map, any given point space can be projected into both or map. Regressing then done by discriminator to estimate...
We present a self-supervision method for 3D hand pose estimation from depth maps. begin with neural network initialized synthesized data and fine-tune it on real but unlabelled maps by minimizing set of data-fitting terms. By approximating the surface spheres, we design differentiable renderer to align estimates comparing rendered input In addition, place priors including data-driven term further regulate estimate's kinematic feasibility. Our makes highly accurate comparable current...
AR/VR devices have started to adopt hand tracking, in lieu of controllers, support user interaction. However, today's input rely primarily on one gesture: pinch. Moreover, current mappings motion use cases like VR locomotion and content scrolling involve more complex larger arm motions than joystick or trackpad usage. STMG increases the gesture space by recognizing additional small thumb-based microgestures from skeletal tracking running a headset. We take machine learning approach achieve...
State-of-the-art methods for 3D hand pose estimation from depth images require large amounts of annotated training data. We propose to model the statistical relationships poses and corresponding using two deep generative models with a shared latent space. By design, our architecture allows learning unlabeled image data in semi-supervised manner. Assuming one-to-one mapping between map, any given point space can be projected into both map. Regressing then done by discriminator estimate...
We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The follows the tree structured topology of wrist to finger tips. propose conditional forest, i.e., Frame Conditioned Regression Forest (FCRF) which uses new normal difference feature. At each stage regression, frame reference is established either or previously estimated joints. By making with respect frame, pose estimation more robust rigid...
In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as high-dimensional trajectory on Lie group with an additional dynamic time warping, and then shallowly learn favorable features. this paper we incorporate the structure into deep network architecture to more appropriate features for recognition. Within structure, design rotation mapping layers transform input desirable...
This paper investigates data-free class-incremental learning (DFCIL) for hand gesture recognition from 3D skeleton sequences. In this (CIL) setting, while incrementally registering the new classes, we do not have access to training samples (i.e. data-free) of already known classes due privacy. Existing DFCIL methods primarily focus on various forms knowledge distillation model inversion mitigate catastrophic forgetting. Unlike SOTA methods, delve deeper into choice best inversion. Inspired...
Detecting logos in real-world images is a great challenging task due to variety of viewpoint or light condition changes and real-time requirements practice. Conventional object detection methods, e.g., part-based model, may suffer from expensively computational cost if it was directly applied this task. A promising alternative, triangle structural descriptor associated with matching strategy, offers an efficient way recognizing logos. However, the fails rotation logo that often occurs when...
In this paper, we propose a method for ranking fashion images to find the ones which might be liked by more people. We collect two new datasets from image sharing websites (Pinterest and Polyvore). represent based on attributes: semantic attributes data-driven attributes. To learn limited training data, use an algorithm multi-task convolutional neural networks share visual knowledge among different attribute categories. discover unsupervisedly, simultaneously clusters fashion-specific...
We present a simple and effective method for 3D hand pose estimation from single depth frame. As opposed to previous state-of-the-art methods based on holistic regression, our works dense pixel-wise estimation. This is achieved by careful design choices in parameterization, which leverages both 2D properties of map. Specifically, we decompose the parameters into set per-pixel estimations, i.e., heat maps, maps unit directional vector fields. The 2D/3D joint offsets are estimated via...
We present a method for recovering the dense 3D surface of hand by regressing vertex coordinates mesh model from single depth map. To this end, we use two-stage 2D fully convolutional network architecture. In first stage, estimates correspondence field every pixel on map or image grid to grid. second design differentiable operator features learned previous stage and regress coordinate Finally, sample recover vertices, fit it an articulated template in closed form. During inference, can...
People often interact with their surroundings by applying pressure hands. While hand can be measured placing sensors between the and environment, doing so alter contact mechanics, interfere human tactile perception, require costly sensors, scale poorly to large environments. We explore possibility of using a conventional RGB camera infer pressure, enabling machine perception from uninstrumented hands surfaces. The central insight is that application results in informative appearance changes....