- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
- Advanced Image Processing Techniques
- Generative Adversarial Networks and Image Synthesis
- Optical measurement and interference techniques
- Face recognition and analysis
- Image Processing Techniques and Applications
- Multimodal Machine Learning Applications
- Hand Gesture Recognition Systems
- Video Surveillance and Tracking Methods
- Tactile and Sensory Interactions
- Domain Adaptation and Few-Shot Learning
- Interactive and Immersive Displays
- Color Science and Applications
- Human Motion and Animation
- Image and Signal Denoising Methods
- Robot Manipulation and Learning
- Industrial Vision Systems and Defect Detection
- Augmented Reality Applications
- Advanced Data Compression Techniques
Google (United States)
2018-2024
Italian Institute of Technology
2012-2017
Perceptive Engineering (United Kingdom)
2017
Microsoft Research (United Kingdom)
2014-2016
Microsoft (United States)
2014-2016
University of Genoa
2014
Microsoft Research (India)
2014
Sapienza University of Rome
2010
We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our demonstrates high-quality, real-time 3D reconstructions of entire space, including people, furniture objects, using a set new depth cameras. These models can also be transmitted in to remote users. This allows users wearing or displays see, hear interact with participants 3D, almost as if they were the same physical space. From audio-visual perspective, communicating interacting edges...
We contribute a new pipeline for live multi-view performance capture, generating temporally coherent high-quality reconstructions in real-time. Our algorithm supports both incremental reconstruction, improving the surface estimation over time, as well parameterizing nonrigid scene motion. approach is highly robust to large frame-to-frame motion and topology changes, allowing us reconstruct extremely challenging scenes. demonstrate advantages related real-time techniques that either deform an...
Abstract Efficient rendering of photo‐realistic virtual worlds is a long standing effort computer graphics. Modern graphics techniques have succeeded in synthesizing images from hand‐crafted scene representations. However, the automatic generation shape, materials, lighting, and other aspects scenes remains challenging problem that, if solved, would make more widely accessible. Concurrently, progress vision machine learning given rise to new approach image synthesis editing, namely deep...
This paper presents HITNet, a novel neural network architecture for real-time stereo matching. Contrary to many recent approaches that operate on full cost volume and rely 3D convolutions, our approach does not explicitly build instead relies fast multi-resolution initialization step, differentiable 2D geometric propagation warping mechanisms infer disparity hypotheses. To achieve high level of accuracy, only geometrically reasons about disparities but also infers slanted plane hypotheses...
We present "The Relightables", a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on systems, focusing 3D geometric reconstruction with resolution textures, much less work done to recover photometric properties needed relighting. Results from such systems lack high-frequency details the subject's shading is prebaked into texture. In contrast, large body of addressed acquisition image-based...
We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. provide three major contributions over prior work: 1) new fusion pipeline allowing for far more faithful high frequency geometric details, avoiding the over-smoothing and visual artifacts observed previously. 2) speed coupled with machine learning technique 3D correspondence field estimation reducing tracking errors are attributed to fast motions....
We present a novel machine learning based algorithm extending the interaction space around mobile devices. The technique uses only RGB camera now commonplace on off-the-shelf Our robustly recognizes wide range of in-air gestures, supporting user variation, and varying lighting conditions. demonstrate that our runs in real-time unmodified devices, including resource-constrained smartphones smartwatches. goal is not to replace touchscreen as primary input device, but rather augment enrich...
Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the constraint, these systems often suffer from artifacts geometry texture holes noise final rendering, poor lighting, low-resolution textures. We take novel approach to augment with deep architecture that takes rendering an arbitrary viewpoint, jointly performs completion, super resolution, denoising imagery...
Augmented reality (AR) for smartphones has matured from a technology earlier adopters, available only on select high-end phones, to one that is truly the general public. One of key breakthroughs been in low-compute methods six degree freedom (6DoF) tracking phones using existing hardware (camera and inertial sensors). 6DoF cornerstone smartphone AR allowing virtual content be precisely locked top real world. However, really give users impression believable AR, requires mobile depth. Without...
Structured light sensors are popular due to their robustness untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera projector pixel. This is often framed as local stereo matching task, correlating patches of pixels in the observed reference image. However, this computationally intensive, leading reduced accuracy framerate. We contribute an algorithm for efficiently, without compromising accuracy. For first time, cast...
We propose a novel system for portrait relighting and background replacement, which maintains high-frequency boundary details accurately synthesizes the subject's appearance as lit by illumination, thereby producing realistic composite images any desired scene. Our technique includes foreground estimation via alpha matting, relighting, compositing. demonstrate that each of these stages can be tackled in sequential pipeline without use priors (e.g. known or illumination) with no specialized...
The light transport (LT) of a scene describes how it appears under different lighting conditions from viewing directions, and complete knowledge scene’s LT enables the synthesis novel views arbitrary lighting. In this article, we focus on image-based acquisition, primarily for human bodies within stage setup. We propose semi-parametric approach learning neural representation that is embedded in texture atlas known but possibly rough geometry. model all non-diffuse global as residuals added...
We present a novel technique to relight images of human faces by learning model facial reflectance from database 4D field data several subjects in variety expressions and viewpoints. Using our learned model, face can be relit arbitrary illumination environments using only two original recorded under spherical color gradient illumination. The output deep network indicates that the contain information needed estimate full field, including specular reflections high frequency details. While...
We present FlexSense, a new thin-film, transparent sensing surface based on printed piezoelectric sensors, which can reconstruct complex deformations without the need for any external sensing, such as cameras. FlexSense provides fully self-contained setup improves mobility and is not affected from occlusions. Using only sparse set of periphery substrate, we devise two algorithms to sheet, using these sensor measurements. An evaluation shows that both proposed are capable reconstructing...
We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands faces is desired. use hybrid classification-regression forests to learn how map from near infrared intensity images absolute , metric in real-time. demonstrate variety human-computer scenarios. Experiments show an accuracy that...
Loss functions for Neural Rendering Jun-Yan Zhu
The increasing demand for 3D content in augmented and virtual reality has motivated the development of volumetric performance capture systemsnsuch as Light Stage. Recent advances are pushing free viewpoint relightable videos dynamic human performances closer to photorealistic quality. However, despite significant efforts, these sophisticated systems limited by reconstruction rendering algorithms which do not fully model complex structures higher order light transport effects such global...
Efficient estimation of depth from pairs stereo images is one the core problems in computer vision. We efficiently solve specialized problem matching under active illumination using a new learning-based algorithm. This type i.e. where scene texture augmented by an light projector proving compelling for designing cameras, largely due to improved robustness when compared time flight or traditional structured techniques. Our algorithm uses unsupervised greedy optimization scheme that learns...
The light stage has been widely used in computer graphics for the past two decades, primarily to enable relighting of human faces. By capturing appearance subject under different sources, one obtains transport matrix that subject, which enables image-based novel environments. However, due finite number lights stage, only represents a sparse sampling on entire sphere. As consequence, with point or directional source does not coincide exactly requires interpolation and resampling images...
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. prevent topological errors, we losslessly com- press signs of which also upper bounds reconstruction error by size. texture, designed fast UV parameterization, generating coherent texture maps...
We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents complex 3D with hierarchy of latent grids, which be decoded into different levels detail and also achieve better accuracy. For completion, we propose grid dropout simulate partial data in the space therefore defer completing functionality decoder side. This along our...
We propose a method to learn high-quality implicit 3D head avatar from monocular RGB video captured in the wild. The learnt is driven by parametric face model achieve user-controlled facial expressions and poses. Our hybrid pipeline combines geometry prior dynamic tracking of 3DMM with neural radiance field fine-grained control photorealism. To reduce over-smoothing improve out-of-model synthesis, we predict local features anchored on geometry. These are deformation interpolated space yield...