Tomáš Hodaň

ORCID: 0000-0003-0576-9997
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Robotics and Sensor-Based Localization
  • Advanced Neural Network Applications
  • Robot Manipulation and Learning
  • Human Pose and Action Recognition
  • Image and Object Detection Techniques
  • 3D Surveying and Cultural Heritage
  • Hand Gesture Recognition Systems
  • Advanced Image and Video Retrieval Techniques
  • Advanced Vision and Imaging
  • Image Processing and 3D Reconstruction
  • Industrial Vision Systems and Defect Detection
  • Augmented Reality Applications
  • Optical measurement and interference techniques
  • 3D Shape Modeling and Analysis
  • Visual Attention and Saliency Detection
  • Handwritten Text Recognition Techniques
  • Education, Psychology, and Social Research
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Forensic Anthropology and Bioarchaeology Studies
  • Anatomy and Medical Technology
  • Human Motion and Animation
  • Domain Adaptation and Few-Shot Learning
  • Speech and dialogue systems
  • Manufacturing Process and Optimization

META Health
2022-2024

Swiss Federal Institute of Metrology
2024

Seattle University
2022

Czech Technical University in Prague
2015-2020

We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The features thirty industry-relevant objects with no significant texture discriminative color or reflectance properties. exhibit symmetries mutual similarities in shape and/or size. Compared to other datasets, unique property is that some are parts others. includes training test images were captured three synchronized sensors, specifically structured-light...

10.1109/wacv.2017.103 article EN 2017-03-01

We present a new method for estimating the 6D pose of rigid objects with available 3D models from single RGB input image. The is applicable to broad range objects, including challenging ones global or partial symmetries. An object represented by compact surface fragments which allow handling symmetries in systematic manner. Correspondences between densely sampled pixels and are predicted using an encoder-decoder network. At each pixel, network predicts: (i) probability object's presence,...

10.1109/cvpr42600.2020.01172 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We present an approach to synthesize highly photorealistic images of 3D object models, which we use train a convolutional neural network for detecting the objects in real images. The proposed has three key ingredients: (1) models are rendered complete scenes with realistic materials and lighting, (2) plausible geometric configuration cameras scene is generated using physics simulation, (3) high photorealism synthesized achieved by physically based rendering. When trained on approach, Faster...

10.1109/icip.2019.8803821 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Despite their ubiquitous presence, texture-less objects present significant challenges to contemporary visual object detection and localization algorithms. This paper proposes a practical method for the accurate 3D of multiple rigid depicted in RGB-D images. The procedure adopts sliding window paradigm, with an efficient cascade-style evaluation each location. A simple pre-filtering is performed first, rapidly rejecting most locations. For remaining location, set candidate templates (i.e....

10.1109/iros.2015.7354005 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2015-09-01

This paper proposes a do-it-all neural model of human hands, named LISA. The can capture accurate hand shape and appearance, generalize to arbitrary sub-jects, provide dense surface correspondences, be reconstructed from images in the wild, easily an-imated. We train LISA by minimizing appearance losses on large set multi-view RGB image se-quences annotated with coarse 3D poses skele-ton. For point local coordinates, our predicts color signed distance respect each bone independently, then...

10.1109/cvpr52688.2022.01988 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We present the evaluation methodology, datasets and results of BOP Challenge 2022, fourth in a series public competitions organized with goal to capture status quo field 6D object pose estimation from an RGB/RGB-D image. In we witnessed another significant improvement accuracy – state art, which was 56.9 AR <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</inf> 2019 (Vidal et al.) 69.8 2020 (CosyPose), moved new heights 83.7 (GDRNPP). Out 49...

10.1109/cvprw59228.2023.00279 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities challenging hand-object interactions. The includes synchronized and exocentric images sampled from recent Assembly101 dataset, in which participants assemble disassemble take-apart toys. To obtain high-quality annotations for images, we develop an efficient pipeline, where use initial set manual train model automatically annotate much larger...

10.1109/cvpr52729.2023.01249 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role VR interaction. Existing work this are limited to either producing root-relative (versus space) or rely on multiple stages such as generating heatmaps kinematic optimization obtain pose. Moreover, the typical scenario, which involves multi-view from wide field view (FOV) cameras seldom addressed by these methods. In paper, we present unified end-to-end differentiable framework for...

10.1145/3550469.3555378 preprint EN 2022-11-29

We propose a simple yet powerful method to segment novel objects in RGB images from their CAD models. Leveraging recent foundation models, Segment Anything and DINOv2, we generate segmentation proposals the input image match them against object templates that are pre-rendered using The matching is realized by comparing DINOv2 cls tokens of proposed regions templates. output set masks associated with per-object confidences defined scores. experimentally demonstrate achieves state-of-the-art...

10.1109/iccvw60793.2023.00227 article EN 2023-10-02

We propose a method for in-hand 3D scanning of an unknown object with monocular camera. Our relies on neural implicit surface representation that captures both the geometry and appearance object, however, by contrast most NeRF-based methods, we do not assume camera-object relative poses are known. Instead, simultaneously optimize shape pose trajectory. As direct optimization over all parameters is prone to fail without coarse-level initialization, incremental approach starts splitting...

10.1109/cvpr52729.2023.01638 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We present the evaluation methodology, datasets and results of BOP Challenge 2023, fifth in a series public competitions organized to capture state art model-based 6D object pose estimation from an RGB/RGB-D image related tasks. Besides three tasks 2022 (model-based 2D detection, segmentation, localization objects seen during training), 2023 challenge introduced new variants these focused on unseen training. In tasks, methods were required learn short onboarding stage (max 5 minutes, 1 GPU)...

10.48550/arxiv.2403.09799 preprint EN arXiv (Cornell University) 2024-03-14

Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible semantically meaningful. Furthermore, generalization unseen objects hindered by limited scale of available interaction datasets. We propose DiffH2O, a novel method synthesize realistic, one or two-handed from provided text prompts geometry object. The introduces three techniques that enable effective learning data. First, we decompose task into...

10.48550/arxiv.2403.17827 preprint EN arXiv (Cornell University) 2024-03-26

We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, well comprehensive ground truth annotations including 3D poses hands, cameras, models hands objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios...

10.48550/arxiv.2406.09598 preprint EN arXiv (Cornell University) 2024-06-13
Coming Soon ...