- Advanced Vision and Imaging
- Human Pose and Action Recognition
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Robotics and Sensor-Based Localization
- Human Motion and Animation
- Optical measurement and interference techniques
- Generative Adversarial Networks and Image Synthesis
- Quantum Computing Algorithms and Architecture
- Graph Theory and Algorithms
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Hand Gesture Recognition Systems
- 3D Surveying and Cultural Heritage
- Advanced Memory and Neural Computing
- Advanced Image and Video Retrieval Techniques
- Face recognition and analysis
- Quantum Information and Cryptography
- Computational Geometry and Mesh Generation
- Robot Manipulation and Learning
- Advanced Image Processing Techniques
- Parallel Computing and Optimization Techniques
- CCD and CMOS Imaging Sensors
- Medical Image Segmentation Techniques
- Advanced MRI Techniques and Applications
Max Planck Institute for Informatics
2019-2025
Saarland University
2021-2023
Max Planck Society
2019-2021
University of Kaiserslautern
2016-2019
German Research Centre for Artificial Intelligence
2015-2018
We present Non-Rigid Neural Radiance Fields (NR-NeRF), a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes. Our takes RGB images of scene as input (e.g., from monocular video recording), creates high-quality space-time geometry appearance representation. show that single handheld consumer-grade camera is sufficient to synthesize sophisticated renderings virtual views, e.g. 'bullet-time' effect. NR-NeRF disentangles the into canonical volume its...
Abstract Synthesizing photo‐realistic images and videos is at the heart of computer graphics has been focus decades research. Traditionally, synthetic a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations geometry material properties input. Collectively, these inputs define actual what rendered, referred to representation (where consists one more objects). Example triangle meshes with accompanied textures (e.g.,...
Motion capture from sparse inertial sensors has shown great potential compared to image-based approaches since occlusions do not lead a reduced tracking quality and the recording space is restricted be within viewing frustum of camera. However, capturing motion global position only set inherently ambiguous challenging. In consequence, recent state-of-the-art methods can barely handle very long period motions, unrealistic artifacts are common due unawareness physical constraints. To this end,...
Conventional methods for human motion synthesis have either been deterministic or had to struggle with the trade-off between diversity vs quality. In response these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework high-quality conditional that can synthesise long, temporally plausible, and semantically accurate motions based on range of conditioning contexts (such as music text). We also present ways well-known kinematic losses plausibility within...
Abstract The field of visual computing is rapidly advancing due to the emergence generative artificial intelligence (AI), which unlocks unprecedented capabilities for generation, editing, and reconstruction images, videos, 3D scenes. In these domains, diffusion models are AI architecture choice. Within last year alone, literature on diffusion‐based tools applications has seen exponential growth relevant papers published across computer graphics, vision, communities with new works appearing...
Marker-less 3D human motion capture from a single colour camera has seen significant progress. However, it is very challenging and severely ill-posed problem. In consequence, even the most accurate state-of-the-art approaches have limitations. Purely kinematic formulations on basis of individual joints or skeletons, frequent frame-wise reconstruction in methods greatly limit accuracy temporal stability compared to multi-view marker-based capture. Further, captured poses are often physically...
The high frame rate is a critical requirement for capturing fast human motions. In this setting, existing markerless image-based methods are constrained by the lighting requirement, data bandwidth and consequent computation overhead. paper, we propose EventCap - first approach 3D of high-speed motions using single event camera. Our method combines model-based optimization CNN-based pose detection to capture frequency motion details reduce drifting in tracking. As result, can at millisecond...
We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in broad range of challenging scenarios. Unlike most neural methods our approach, we dub "physionical", is aware physical and environmental constraints. It combines fully-differentiable way several key innovations, i.e. , 1) proportional-derivative controller, with gains predicted by network, that reduces delays even the presence fast motions, 2) an explicit...
We present a new pose transfer method for synthesizing human animation from single image of person controlled by sequence body poses. Existing methods exhibit significant visual artifacts when applying to novel scene, resulting in temporal inconsistency and failures preserving the identity textures person. To address these limitations, we design compositional neural network that predicts silhouette, garment labels, textures. Each modular is explicitly dedicated subtask can be learned...
Abstract Can we make virtual characters in a scene interact with their surrounding objects through simple instructions? Is it possible to synthesize such motion plausibly diverse set of and Inspired by these questions, present the first framework full‐body human performing specified actions 3D placed within reach. Our system takes textual instructions specifying associated ‘intentions’ as input outputs sequences motions. This contrasts existing works, where action synthesis methods generally...
Asynchronously operating event cameras find many applications due to their high dynamic range, vanishingly low motion blur, latency and data bandwidth. The field saw remarkable progress during the last few years, existing event-based 3D reconstruction approaches recover sparse point clouds of scene. However, such sparsity is a limiting factor in cases, especially computer vision graphics, that has not been addressed satisfactorily so far. Accordingly, this paper proposes first approach for...
Human and environment sensing are two important topics in Computer Vision Graphics. motion is often captured by inertial sensors, while the mostly reconstructed using cameras. We integrate techniques together EgoLocate, a system that simultaneously performs human capture (mocap), localization, mapping real time from sparse body-mounted including 6 measurement units (IMUs) monocular phone camera. On one hand, mocap suffers large translation drift due to lack of global positioning signal....
3D hand shape and pose estimation from a single depth map is new challenging computer vision problem with many applications. The state-of-the-art methods directly regress meshes 2D images via convolutional neural networks, which leads to artefacts in the estimations due perspective distortions images. In contrast, we propose novel architecture convolutions trained weakly-supervised manner. input our method voxelized map, rely on two representations. first one grid of accurate but does not...
Generative adversarial networks achieve great performance in photorealistic image synthesis various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically relevant individual parts image, and cannot draw samples only differ partial aspects, such as clothing style. We address these limitations present a generative model for images dressed humans offering over pose, local body...
Photo-realistic re-rendering of a human from single image with explicit control over body pose, shape and appearance enables wide range applications, such as transfer, virtual try-on, motion imitation, novel view synthesis. While significant progress has been made in this direction using learning-based generation tools, GANs, existing approaches yield noticeable artefacts blurring fine details, unrealistic distortions the parts garments well severe changes textures. We, therefore, propose...
Abstract 3D reconstruction of deformable (or non‐rigid ) scenes from a set monocular 2D image observations is long‐standing and actively researched area computer vision graphics. It an ill‐posed inverse problem, since—without additional prior assumptions—it permits infinitely many solutions leading to accurate projection the input images. Non‐rigid foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage using cameras their...
Abstract Reconstructing models of the real world, including 3D geometry, appearance, and motion scenes, is essential for computer graphics vision. It enables synthesizing photorealistic novel views, useful movie industry AR/VR applications. also facilitates content creation necessary in games by avoiding laborious manual design processes. Further, such are fundamental intelligent computing systems that need to interpret real‐world scenes actions act interact safely with human world. Notably,...
Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a non-rigid scene in time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos background images from static cameras with known camera parameters as input. It then reconstructs...
Abstract Dynamic reconstruction and spatiotemporal novel‐view synthesis of non‐rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality performance on multi‐view or teleporting camera setups, most methods fail to efficiently faithfully recover motion appearance from casual monocular captures. This paper contributes the field by introducing a new method for dynamic novel view video, such as smartphone Our approach represents scene neural...
The problem of dense point set registration, given a sparse prior correspondences, often arises in computer vision tasks. Unlike the rigid case, integrating knowledge into registration algorithm is especially demanding non-rigid case due to high variability motion and deformation. In this paper we present Extended Coherent Point Drift algorithm. It enables, on one hand, couple correspondence priors procedure closed form and, other process large sets reasonable time through adopting an...