- 3D Shape Modeling and Analysis
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Hand Gesture Recognition Systems
- 3D Surveying and Cultural Heritage
- Robot Manipulation and Learning
- Human Motion and Animation
- Image Processing and 3D Reconstruction
- Video Surveillance and Tracking Methods
- Robotics and Sensor-Based Localization
- Domain Adaptation and Few-Shot Learning
- Image and Object Detection Techniques
- Advanced Numerical Analysis Techniques
- Teleoperation and Haptic Systems
- Multimodal Machine Learning Applications
- Augmented Reality Applications
- Generative Adversarial Networks and Image Synthesis
- Advanced Neural Network Applications
- Interactive and Immersive Displays
- 3D Modeling in Geospatial Applications
- Music Technology and Sound Studies
- Muscle activation and electromyography studies
- Neural Networks and Applications
- Multimedia Communication and Technology
John Brown University
2020-2025
Brown University
2022
Stanford University
2017-2020
Max Planck Institute for Informatics
2014-2017
R.V. College of Engineering
2017
Max Planck Society
2012-2015
Irvine University
2003
The goal of this paper is to estimate the 6D pose and dimensions unseen object instances in an RGB-D image. Contrary "instance-level'' estimation tasks, our problem assumes that no exact CAD models are available during either training or testing time. To handle different a given category, we introduce Normalized Object Coordinate Space (NOCS)-a shared canonical representation for all possible within category. Our region-based neural network then trained directly infer correspondence from...
We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our method combines convolutional neural network with kinematic model, such that it generalizes well to unseen data, is robust occlusions and varying camera viewpoints, leads anatomically plausible as temporally smooth motions. For training our CNN we propose novel approach for synthetic generation data geometrically consistent image-to-image translation network. To be more...
We propose a new single-shot method for multi-person 3D pose estimation in general scenes from monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body inference even under strong partial occlusions by other people and objects the scene. ORPM outputs fixed number of maps encode joint locations all Body part associations [8] allow us to infer an arbitrary without explicit bounding box prediction. To train our we introduce MuCo-3DHP, first large...
Abstract Recent advances in machine learning have led to increased interest solving visual computing problems using methods that employ coordinate‐based neural networks. These methods, which we call fields , parameterize physical properties of scenes or objects across space and time. They seen widespread success such as 3D shape image synthesis, animation human bodies, reconstruction, pose estimation. Rapid progress has numerous papers, but a consolidation the discovered knowledge not yet...
We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail hand-object interactions scenes imaged viewpoints, common virtual or augmented reality applications. Our uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the regress 3D joint locations. Hand localization is achieved by using a CNN estimate 2D position of center input, even presence...
Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because inaccuracies, incomplete coverage motions, low framerate, complex camera setups, high computational requirements. In this paper, we present fast method accurately rapid articulations the hand using single depth camera. Our algorithm uses novel detectionguided optimization strategy that increases robustness speed pose estimation. detection step, randomized...
We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating human motion shape from dynamic observations, recovering plausible sequences the presence noise occlusions remains challenge. For this purpose, we propose an expressive generative model form conditional variational autoencoder, which learns distribution change at each step sequence. Furthermore, flexible optimization-based approach that...
This paper investigates an emerging input method enabled by progress in hand tracking: free motion of fingers. The is expressive, potentially fast, and usable across many settings as it does not insist on physical contact or visual feedback. Our goal to inform the design high-performance methods providing detailed analysis performance anatomical characteristics finger motion. We conducted experiment using a commercially available sensor report speed, accuracy, individuation, movement ranges,...
We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail hand-object interactions scenes imaged viewpoints-common virtual or augmented reality applications. Our uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the regress 3D joint locations. Hand localization is achieved by using a CNN estimate 2D position of center input, even presence...
This paper contributes a novel sensing approach to support on- and above-skin finger input for interaction on the move. WatchSense uses depth sensor embedded in wearable device expand space neighboring areas of skin above it. Our addresses challenging camera-based tracking conditions, such as oblique viewing angles occlusions. It can accurately detect fingertips, their locations, whether they are touching or hovering extends previous work that supported either mid-air multitouch by...
Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types symmetry, co-linearity or co-circularity, spacing uniformity in linear circular patterns, and further inter-object relationships that relate style functionality. Previous approaches relied on input explicitly specify goal state, synthesized scenes from scratch - but methods do not address rearrangement...
Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods shapes with limited fidelity diversity. We introduce CLIP-Sculptor, a method address constraints by producing high-fidelity diverse without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in multi-resolution approach first generates low-dimensional latent space then upscales higher resolution improved shape fidelity. For diversity, we use...
Augmented reality (AR) in automobiles has the potential to significantly alter driver's user experience. Prototypes developed academia and industry demonstrate a range of applications from advanced driver assist systems location-based information services. A user-centered process for creating evaluating designs AR displays helps explore what collaborative role should serve between technologies automobile driver. In particular, we consider nature this along three important perspectives:...
Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate arbitrary motion a challenging problem due to the many degrees freedom, frequent self-occlusions, fast motions, uniform skin color. In this paper, we propose new approach that tracks full skeleton from multiple RGB cameras real-time. The main contributions include generative method which employs an implicit shape representation based on Sum Anisotropic Gaussians (SAG), pose...
We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate shapes consume an entire text prompt in single step. However, humans tend describe recursively-we may start with initial description and progressively add details based on intermediate results. To capture this process, we introduce method distribution, conditioned phrase, that gradually evolves as more phrases are added. Since existing datasets insufficient training...
We propose CaSPR, a method to learn object-centric Canonical Spatiotemporal Point Cloud Representations of dynamically moving or evolving objects. Our goal is enable information aggregation over time and the interrogation object state at any spatiotemporal neighborhood in past, observed not. Different from previous work, CaSPR learns representations that support spacetime continuity, are robust variable irregularly spacetime-sampled point clouds, generalize unseen instances. approach divides...
Progress in 3D object understanding has relied on manually "canonicalized" shape datasets that contain instances with consistent position and orientation (3D pose). This made it hard to generalize these methods in-the-wild shapes, e.g., from internet model collections or depth sensors. ConDor is a self-supervised method learns Canonicalize the for full partial point clouds. We build top of Tensor Field Networks (TFNs), class permutation- rotation-equivariant, translation-invariant networks....
The majority of descriptor-based methods for geometric processing non-rigid shape rely on hand-crafted descriptors. Recently, learning-based techniques have been shown effective, achieving state-of-the-art results in a variety tasks. Yet, even though these can principle work directly raw data, most still handcrafted descriptors at the input layer. In this work, we wish to challenge practice and use neural network learn from mesh. To end, introduce two modules into our architecture. first is...
Recent advancement in 2D image diffusion models has driven significant progress text-guided texture synthesis, enabling realistic, high-quality generation from arbitrary text prompts. However, current methods usually focus on synthesizing for single static 3D objects, and struggle to handle entire families of shapes, such as those produced by procedural programs. Applying existing naively each shape is too slow support exploring different parameter settings at interactive rates, also results...
3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, permutation-invariant nature. In this work, we present a simple yet effective method overcome these challenges. We utilize spherical mapping transform into structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions concatenation of attributes such position, scale,...