- 3D Shape Modeling and Analysis
- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Generative Adversarial Networks and Image Synthesis
- 3D Surveying and Cultural Heritage
- Robotics and Sensor-Based Localization
- Human Pose and Action Recognition
- Face recognition and analysis
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Numerical Analysis Techniques
- Optical measurement and interference techniques
- Image Processing and 3D Reconstruction
- Digital Media Forensic Detection
- Remote Sensing and LiDAR Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Speech and Audio Processing
- Computational Geometry and Mesh Generation
- Advanced Image Processing Techniques
- Human Motion and Animation
- Video Surveillance and Tracking Methods
- Adversarial Robustness in Machine Learning
- Hand Gesture Recognition Systems
- Industrial Vision Systems and Defect Detection
Technical University of Munich
2017-2024
Association for Computing Machinery
2021
Stanford University
2013-2020
ETH Zurich
2018
Courant Institute of Mathematical Sciences
2018
New York University
2018
Tel Aviv University
2018
Czech Academy of Sciences, Institute of Computer Science
2018
Intel (United States)
2018
Palo Alto University
2014-2015
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in context RGB-D scene understanding, very little data available - current datasets cover a small range views and have limited semantic annotations. To address this issue, we introduce ScanNet, an video dataset containing 2.5M 1513 scenes annotated with 3D camera poses, surface reconstructions, segmentations. collect data, designed easy-to-use scalable capture...
The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads loss of trust digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines realism state-of-the-art manipulations, how difficult is detect them, either automatically humans. To standardize evaluation detection methods, we propose an automated...
We present a novel approach for real-time facial reenactment of monocular target video sequence (e.g., Youtube video). The source is also stream, captured live with commodity webcam. Our goal to animate the expressions by actor and re-render manipulated output in photo-realistic fashion. To this end, we first address under-constrained problem identity recovery from non-rigid model-based bundling. At run time, track both using dense photometric consistency measure. Reenactment then achieved...
The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect content, for instance, obtained from photo-metric reconstructions with noisy and incomplete surface geometry, while still aiming to produce photo-realistic (re-)renderings. To address challenging problem, introduce Deferred Neural Rendering , a new paradigm image synthesis that combines...
Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit performance current state-of-art methods, which are typically based histograms over properties. In this paper, we present 3DMatch, data-driven model that learns volumetric patch descriptor for establishing correspondences between partial To amass training data our model, propose self-supervised feature learning...
Online 3D reconstruction is gaining newfound interest due to the availability of real-time consumer depth cameras. The basic problem takes live overlapping maps as input and incrementally fuses these into a single model. This challenging particularly when performance desired without trading quality or scale. We contribute an online system for large fine scale volumetric based on memory speed efficient data structure. Our uses simple spatial hashing scheme that compresses space, allows access...
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches are restricted manipulations facial expressions only, we the first transfer full 3D head position, rotation, face expression, eye gaze, and blinking from source actor video target actor. The core our is generative neural network with space-time architecture. takes as synthetic renderings parametric model, based on which it predicts frames for...
Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges drift in pose estimation, introducing significant errors the accumulated model. Approaches often require hours offline processing globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes perform correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model)...
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in context RGB-D scene understanding, very little data available -- current datasets cover a small range views and have limited semantic annotations. To address this issue, we introduce ScanNet, an video dataset containing 2.5M 1513 scenes annotated with 3D camera poses, surface reconstructions, segmentations. collect data, designed easy-to-use scalable capture...
We present a novel approach for real-time facial reenactment of monocular target video sequence (e.g., Youtube video). The source is also stream, captured live with commodity webcam. Our goal to animate the expressions by actor and re-render manipulated output in photo-realistic fashion. To this end, we first address under-constrained problem identity recovery from non-rigid model-based bundling. At run time, track both using dense photometric consistency measure. Reenactment then achieved...
We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time . Our system uses single self-contained stereo camera unit built from off-the-shelf components consumer graphics to generate spatio-temporally coherent 3D models at 30 Hz. A new matching algorithm estimates RGB-D data. start by scanning smooth template model the subject as they move rigidly. This geometric surface prior avoids strong...
We present a method for the real-time transfer of facial expressions from an actor in source video to target video, thus enabling ad-hoc control actor. The novelty our approach lies and photorealistic re-rendering deformations detail into way that newly-synthesized are virtually indistinguishable real video. To achieve this, we accurately capture performances subjects using commodity RGB-D sensor. For each frame, jointly fit parametric model identity, expression, skin reflectance input color...
Abstract Efficient rendering of photo‐realistic virtual worlds is a long standing effort computer graphics. Modern graphics techniques have succeeded in synthesizing images from hand‐crafted scene representations. However, the automatic generation shape, materials, lighting, and other aspects scenes remains challenging problem that, if solved, would make more widely accessible. Concurrently, progress vision machine learning given rise to new approach image synthesis editing, namely deep...
With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors fake videos. In fact, distinguishing between original manipulated video can be challenge humans computers alike, especially when the compressed or have low resolution, as often happens on social networks. Research detection face...
We introduce ScanComplete, a novel data-driven approach for taking an incomplete 3D scan of scene as input and predicting complete model along with per-voxel semantic labels. The key contribution our method is its ability to handle large scenes varying spatial extent, managing the cubic growth in data size increases. To this end, we devise fully-convolutional generative CNN whose filter kernels are invariant overall size. can be trained on subvolumes but deployed arbitrarily at test time. In...
Access to large, diverse RGB-D datasets is critical for training scene understanding algorithms. However, existing still cover only a limited number of views or restricted scale spaces. In this paper, we introduce Matterport3D, large-scale dataset containing 10,800 panoramic from 194,400 images 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D 3D semantic segmentations. The precise global alignment comprehensive, set over entire buildings...
Abstract The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel powerful algorithms that obtain impressive results even very challenging case of reconstruction from a single RGB or RGB‐D camera. range applications is vast steadily growing as these technologies are further improving speed,...
Abstract The advent of affordable consumer grade RGB‐D cameras has brought about a profound advancement visual scene reconstruction methods. Both computer graphics and vision researchers spend significant effort to develop entirely new algorithms capture comprehensive shape models static dynamic scenes with cameras. This led advances the state art along several dimensions. Some methods achieve very high detail, despite limited sensor resolution. Others even real‐time performance, yet...
Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source also stream, captured live with commodity webcam. Our goal to animate the expressions by actor and re-render manipulated output in photo-realistic fashion. To this end, we first address under-constrained problem identity recovery from non-rigid model-based bundling. At run time, track both using dense photometric consistency measure. Reenactment then achieved fast...
Distinguishing manipulated from real images is becoming increasingly difficult as new sophisticated image forgery approaches come out by the day. Naive classification based on Convolutional Neural Networks (CNNs) show excellent performance in detecting manipulations when they are trained a specific method. However, examples unseen manipulation approaches, their drops significantly. To address this limitation transferability, we introduce Forensic-Transfer (FT). We devise learning-based...
Abstract Synthesizing photo‐realistic images and videos is at the heart of computer graphics has been focus decades research. Traditionally, synthetic a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations geometry material properties input. Collectively, these inputs define actual what rendered, referred to representation (where consists one more objects). Example triangle meshes with accompanied textures (e.g.,...