- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- Robot Manipulation and Learning
- Human Pose and Action Recognition
- Image and Object Detection Techniques
- 3D Surveying and Cultural Heritage
- Hand Gesture Recognition Systems
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Image Processing and 3D Reconstruction
- Industrial Vision Systems and Defect Detection
- Augmented Reality Applications
- Optical measurement and interference techniques
- 3D Shape Modeling and Analysis
- Visual Attention and Saliency Detection
- Handwritten Text Recognition Techniques
- Education, Psychology, and Social Research
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Forensic Anthropology and Bioarchaeology Studies
- Anatomy and Medical Technology
- Human Motion and Animation
- Domain Adaptation and Few-Shot Learning
- Speech and dialogue systems
- Manufacturing Process and Optimization
META Health
2022-2024
Swiss Federal Institute of Metrology
2024
Seattle University
2022
Czech Technical University in Prague
2015-2020
We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The features thirty industry-relevant objects with no significant texture discriminative color or reflectance properties. exhibit symmetries mutual similarities in shape and/or size. Compared to other datasets, unique property is that some are parts others. includes training test images were captured three synchronized sensors, specifically structured-light...
We present a new method for estimating the 6D pose of rigid objects with available 3D models from single RGB input image. The is applicable to broad range objects, including challenging ones global or partial symmetries. An object represented by compact surface fragments which allow handling symmetries in systematic manner. Correspondences between densely sampled pixels and are predicted using an encoder-decoder network. At each pixel, network predicts: (i) probability object's presence,...
We present an approach to synthesize highly photorealistic images of 3D object models, which we use train a convolutional neural network for detecting the objects in real images. The proposed has three key ingredients: (1) models are rendered complete scenes with realistic materials and lighting, (2) plausible geometric configuration cameras scene is generated using physics simulation, (3) high photorealism synthesized achieved by physically based rendering. When trained on approach, Faster...
Despite their ubiquitous presence, texture-less objects present significant challenges to contemporary visual object detection and localization algorithms. This paper proposes a practical method for the accurate 3D of multiple rigid depicted in RGB-D images. The procedure adopts sliding window paradigm, with an efficient cascade-style evaluation each location. A simple pre-filtering is performed first, rapidly rejecting most locations. For remaining location, set candidate templates (i.e....
This paper proposes a do-it-all neural model of human hands, named LISA. The can capture accurate hand shape and appearance, generalize to arbitrary sub-jects, provide dense surface correspondences, be reconstructed from images in the wild, easily an-imated. We train LISA by minimizing appearance losses on large set multi-view RGB image se-quences annotated with coarse 3D poses skele-ton. For point local coordinates, our predicts color signed distance respect each bone independently, then...
We present the evaluation methodology, datasets and results of BOP Challenge 2022, fourth in a series public competitions organized with goal to capture status quo field 6D object pose estimation from an RGB/RGB-D image. In we witnessed another significant improvement accuracy – state art, which was 56.9 AR <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</inf> 2019 (Vidal et al.) 69.8 2020 (CosyPose), moved new heights 83.7 (GDRNPP). Out 49...
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities challenging hand-object interactions. The includes synchronized and exocentric images sampled from recent Assembly101 dataset, in which participants assemble disassemble take-apart toys. To obtain high-quality annotations for images, we develop an efficient pipeline, where use initial set manual train model automatically annotate much larger...
Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role VR interaction. Existing work this are limited to either producing root-relative (versus space) or rely on multiple stages such as generating heatmaps kinematic optimization obtain pose. Moreover, the typical scenario, which involves multi-view from wide field view (FOV) cameras seldom addressed by these methods. In paper, we present unified end-to-end differentiable framework for...
We propose a simple yet powerful method to segment novel objects in RGB images from their CAD models. Leveraging recent foundation models, Segment Anything and DINOv2, we generate segmentation proposals the input image match them against object templates that are pre-rendered using The matching is realized by comparing DINOv2 cls tokens of proposed regions templates. output set masks associated with per-object confidences defined scores. experimentally demonstrate achieves state-of-the-art...
We propose a method for in-hand 3D scanning of an unknown object with monocular camera. Our relies on neural implicit surface representation that captures both the geometry and appearance object, however, by contrast most NeRF-based methods, we do not assume camera-object relative poses are known. Instead, simultaneously optimize shape pose trajectory. As direct optimization over all parameters is prone to fail without coarse-level initialization, incremental approach starts splitting...
We present the evaluation methodology, datasets and results of BOP Challenge 2023, fifth in a series public competitions organized to capture state art model-based 6D object pose estimation from an RGB/RGB-D image related tasks. Besides three tasks 2022 (model-based 2D detection, segmentation, localization objects seen during training), 2023 challenge introduced new variants these focused on unseen training. In tasks, methods were required learn short onboarding stage (max 5 minutes, 1 GPU)...
Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible semantically meaningful. Furthermore, generalization unseen objects hindered by limited scale of available interaction datasets. We propose DiffH2O, a novel method synthesize realistic, one or two-handed from provided text prompts geometry object. The introduces three techniques that enable effective learning data. First, we decompose task into...
We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, well comprehensive ground truth annotations including 3D poses hands, cameras, models hands objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios...