- Advanced Vision and Imaging
- Image Enhancement Techniques
- Advanced Image Processing Techniques
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Algorithms and Data Compression
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Satellite Image Processing and Photogrammetry
- Robotics and Sensor-Based Localization
- Advanced Data Compression Techniques
- Robot Manipulation and Learning
- Advanced Sensor and Energy Harvesting Materials
- Hand Gesture Recognition Systems
- Virtual Reality Applications and Impacts
- Industrial Vision Systems and Defect Detection
- Interactive and Immersive Displays
- Advanced Wireless Communication Techniques
- Prosthetics and Rehabilitation Robotics
- Image Retrieval and Classification Techniques
- Image and Video Stabilization
- Human Motion and Animation
ETH Zurich
2021-2024
École Polytechnique Fédérale de Lausanne
2018
We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo. With high computation speed low memory requirement, PatchmatchNet can process higher resolution imagery is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization. For the first time we introduce an iterative multi-scale in end-to-end trainable architecture improve core algorithm with learned adaptive propagation evaluation...
We present IterMVS, a new data-driven method for high-resolution multi-view stereo. propose novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state. Ingesting multi-scale matching information, our model refines these over multiple iterations and infers confidence. To extract the maps, we combine traditional classification regression manner. verify efficiency effectiveness on DTU, Tanks&Temples ETH3D. While being most efficient both memory...
For industrial manufacturing, robots are required to work together with human counterparts on certain special occasions, where workers share their skills robots. Intuitive human–robot interaction brings increasing safety challenges, which can be properly addressed by using sensor-based active control technology. In this article, we designed and fabricated a three-dimensional flexible robot skin made the piezoresistive nanocomposite based need for enhancement of security performance...
The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction. However, most existing reconstruction methods optimize perscene parameters and therefore lack generalizability new scenes. We introduce VolRecon, a generalizable method with Signed Ray Distance Function (SRDF). To reconstruct fine details little noise, VolRecon combines projection features aggregated from multi-view features, volume interpolated...
Learning-based visual localization methods that use scene coordinate regression (SCR) offer the advantage of smaller map sizes. However, on datasets with complex illumination changes or image-level ambiguities, it remains a less robust alternative to feature matching methods. This work aims close gap. We introduce covisibility graph-based global encoding learning and data augmentation strategy, along depth-adjusted reprojection loss facilitate implicit triangulation. Additionally, we revisit...
We present a multi-sensor system for consistent 3D hand pose tracking and modeling that leverages the advantages of both wearable optical sensors. Specifically, we employ stretch-sensing soft glove three IMUs in combination with an RGB-D camera. Different sensor modalities are fused based on availability confidence estimation, enabling seamless challenging environments partial or even complete occlusion. To maximize accuracy while maintaining high ease-of-use, propose automated user...
Neural 3D scene representations have shown great potential for reconstruction from 2D images. However, reconstructing real-world captures of complex scenes still remains a challenge. Existing generic methods often struggle to represent fine geometric details and do not adequately model reflective surfaces large-scale scenes. Techniques that explicitly focus on can detailed reflections by exploiting better reflection parameterizations. we observe these are robust in real unbounded scenarios...
Scene coordinate regression (SCR) methods are a family of visual localization that directly regress 2D-3D matches for camera pose estimation. They effective in small-scale scenes but face significant challenges large-scale further amplified the absence ground truth 3D point clouds supervision. Here, model can only rely on reprojection constraints and needs to implicitly triangulate points. The stem from fundamental dilemma: network has be invariant observations same landmark at different...
3D reconstruction aims to recover the dense structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize comprehensive representation, enabling precise complex environments. Due its efficiency effectiveness, MVS has become pivotal method for image-based reconstruction. Recently, with success...
Gaussian splatting and single/multi-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect study their interactions. More specifically, first contribute a robust multi-view model by leveraging pre-trained monocular features, leading high-quality feed-forward 3D reconstructions. We also show that can serve as an unsupervised pre-training objective for learning powerful models from large-scale unlabelled datasets. validate the synergy between...
3D textured shape recovery from partial scans is crucial for many real-world applications. Existing approaches have demonstrated the efficacy of implicit function representation, but they suffer inputs with severe occlusions and varying object types, which greatly hinders their application value in real world. This technical report presents our approach to address these limitations by incorporating learned geometric priors. To this end, we generate a SMPL model pose prediction fuse it into...
We present IterMVS, a new data-driven method for high-resolution multi-view stereo. propose novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state. Ingesting multi-scale matching information, our model refines these over multiple iterations and infers confidence. To extract the maps, we combine traditional classification regression manner. verify efficiency effectiveness on DTU, Tanks&Temples ETH3D. While being most efficient both memory...
We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo. With high computation speed low memory requirement, PatchmatchNet can process higher resolution imagery is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization. For the first time we introduce an iterative multi-scale in end-to-end trainable architecture improve core algorithm with learned adaptive propagation evaluation...
The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction. However, most existing reconstruction methods optimize per-scene parameters and therefore lack generalizability new scenes. We introduce VolRecon, a generalizable method with Signed Ray Distance Function (SRDF). To reconstruct fine details little noise, VolRecon combines projection features aggregated from multi-view features, volume interpolated...