- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Robotics and Sensor-Based Localization
- Optical measurement and interference techniques
- Advanced Image Processing Techniques
- Image Processing Techniques and Applications
- Image Enhancement Techniques
- Image Retrieval and Classification Techniques
- Advanced Neural Network Applications
- Visual Attention and Saliency Detection
- Video Analysis and Summarization
- 3D Surveying and Cultural Heritage
- Medical Image Segmentation Techniques
- Explainable Artificial Intelligence (XAI)
- Speech and Audio Processing
- Music and Audio Processing
- Multimodal Machine Learning Applications
- Computer Graphics and Visualization Techniques
- 3D Shape Modeling and Analysis
- Interactive and Immersive Displays
- Indoor and Outdoor Localization Technologies
- Video Coding and Compression Technologies
- Data Visualization and Analytics
- Domain Adaptation and Few-Shot Learning
- Cell Image Analysis Techniques
Google (United States)
2018-2023
Perceptive Engineering (United Kingdom)
2017
Microsoft Research (United Kingdom)
2016
Microsoft (United States)
2016
Cornell University
2009-2014
We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our demonstrates high-quality, real-time 3D reconstructions of entire space, including people, furniture objects, using a set new depth cameras. These models can also be transmitted in to remote users. This allows users wearing or displays see, hear interact with participants 3D, almost as if they were the same physical space. From audio-visual perspective, communicating interacting edges...
This paper presents an algorithm for Interactive Co-segmentation of a foreground object from group related images. While previous approaches focus on unsupervised co-segmentation, we use successful ideas the interactive object-cutout literature. We develop that allows users to decide what is, and then guide output co-segmentation towards it via scribbles. Interestingly, keeping user in loop leads simpler highly parallelizable energy functions, allowing us work with significantly more images...
We contribute a new pipeline for live multi-view performance capture, generating temporally coherent high-quality reconstructions in real-time. Our algorithm supports both incremental reconstruction, improving the surface estimation over time, as well parameterizing nonrigid scene motion. approach is highly robust to large frame-to-frame motion and topology changes, allowing us reconstruct extremely challenging scenes. demonstrate advantages related real-time techniques that either deform an...
This paper presents HITNet, a novel neural network architecture for real-time stereo matching. Contrary to many recent approaches that operate on full cost volume and rely 3D convolutions, our approach does not explicitly build instead relies fast multi-resolution initialization step, differentiable 2D geometric propagation warping mechanisms infer disparity hypotheses. To achieve high level of accuracy, only geometrically reasons about disparities but also infers slanted plane hypotheses...
We present "The Relightables", a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on systems, focusing 3D geometric reconstruction with resolution textures, much less work done to recover photometric properties needed relighting. Results from such systems lack high-frequency details the subject's shading is prebaked into texture. In contrast, large body of addressed acquisition image-based...
We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. provide three major contributions over prior work: 1) new fusion pipeline allowing for far more faithful high frequency geometric details, avoiding the over-smoothing and visual artifacts observed previously. 2) speed coupled with machine learning technique 3D correspondence field estimation reducing tracking errors are attributed to fast motions....
Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the constraint, these systems often suffer from artifacts geometry texture holes noise final rendering, poor lighting, low-resolution textures. We take novel approach to augment with deep architecture that takes rendering an arbitrary viewpoint, jointly performs completion, super resolution, denoising imagery...
Mobile devices with passive depth sensing capabilities are ubiquitous, and recently active sensors have become available on some tablets AR/VR devices. Although real-time data is accessible, its rich value to mainstream AR applications has been sorely under-explored. Adoption of depth-based UX impeded by the complexity performing even simple operations raw data, such as detecting intersections or constructing meshes. In this paper, we introduce DepthLab, a software library that encapsulates...
Augmented reality (AR) for smartphones has matured from a technology earlier adopters, available only on select high-end phones, to one that is truly the general public. One of key breakthroughs been in low-compute methods six degree freedom (6DoF) tracking phones using existing hardware (camera and inertial sensors). 6DoF cornerstone smartphone AR allowing virtual content be precisely locked top real world. However, really give users impression believable AR, requires mobile depth. Without...
Structured light sensors are popular due to their robustness untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera projector pixel. This is often framed as local stereo matching task, correlating patches of pixels in the observed reference image. However, this computationally intensive, leading reduced accuracy framerate. We contribute an algorithm for efficiently, without compromising accuracy. For first time, cast...
The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, is used define an energy function local iterative optimization performed from these poses order find "good minimum" (i.e. minimum close true pose). Performing quickly key exploring more poses, performing iterations and, crucially, exploiting high frame rates ensure...
We present a novel technique to relight images of human faces by learning model facial reflectance from database 4D field data several subjects in variety expressions and viewpoints. Using our learned model, face can be relit arbitrary illumination environments using only two original recorded under spherical color gradient illumination. The output deep network indicates that the contain information needed estimate full field, including specular reflections high frequency details. While...
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based is, however, still challenging, given complex workflows are not ideal design and experimentation. To better understand these challenges, we conducted formative study with seven ML practitioners to gather insights about common evaluation workflows.
Efficient estimation of depth from pairs stereo images is one the core problems in computer vision. We efficiently solve specialized problem matching under active illumination using a new learning-based algorithm. This type i.e. where scene texture augmented by an light projector proving compelling for designing cameras, largely due to improved robustness when compared time flight or traditional structured techniques. Our algorithm uses unsupervised greedy optimization scheme that learns...
Scene understanding includes many related subtasks, such as scene categorization, depth estimation, object detection, etc. Each of these subtasks is often notoriously hard, and state-of-the-art classifiers already exist for them. These operate on the same raw image provide correlated outputs. It desirable to have an algorithm that can capture correlation without requiring any changes inner workings classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), jointly...
The advent of consumer depth cameras has incited the development a new cohort algorithms tackling challenging computer vision problems. primary reason is that provides direct geometric information largely invariant to texture and illumination. As such, substantial progress been made in human object pose estimation, 3D reconstruction simultaneous localization mapping. Most these naturally benefit from ability accurately track an or scene interest one frame next. However, commercially...
This paper presents an active-learning algorithm for piecewise planar 3D reconstruction of a scene. While previous interactive algorithms require the user to provide tedious interactions identify all planes in scene, we build on successful ideas from automatic and introduce idea active learning, thereby improving reconstructions while considerably reducing effort. Our first attempts obtain scene automatically through energy minimization framework. The proposed then uses intuitive cues...
We introduce a realtime compression architecture for 4D performance capture that is two orders of magnitude faster than current state-of-the-art techniques, yet achieves comparable visual quality and bitrate. note how much the algorithmic complexity in traditional arises from necessity to encode geometry using an explicit model (i.e. triangle mesh). In contrast, we propose encoder leverages implicit representation (namely Signed Distance Function) represent observed geometry, as well its...
Numerous computer vision problems such as stereo depth estimation, object-class segmentation and fore-ground/background can be formulated per-pixel image labeling tasks. Given one or many images input, the desired output of these methods is usually a spatially smooth assignment labels. The large amount has lead to significant research efforts, with state art moving from CRF-based approaches deep CNNs more recently, hybrids two. Although have significantly advanced art, vast majority solely...