- Advanced Vision and Imaging
- Optical measurement and interference techniques
- Computer Graphics and Visualization Techniques
- Human Pose and Action Recognition
- Robotics and Sensor-Based Localization
- 3D Shape Modeling and Analysis
- Image Enhancement Techniques
- Human Motion and Animation
- Video Surveillance and Tracking Methods
- Image Processing Techniques and Applications
- 3D Surveying and Cultural Heritage
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Advanced Image Processing Techniques
- Advanced Neural Network Applications
- Advanced Optical Imaging Technologies
- Gaze Tracking and Assistive Technology
- Multimodal Machine Learning Applications
- Photovoltaic System Optimization Techniques
- Solar Radiation and Photovoltaics
- Remote Sensing and LiDAR Applications
- Hand Gesture Recognition Systems
- Advanced Optical Sensing Technologies
- Autonomous Vehicle Technology and Safety
Kyoto Institute of Technology
2025
Kyoto University
2015-2024
Kyoto College of Graduate Studies for Informatics
2009-2024
Carnegie Mellon University
2015
We present an approach to capture the 3D structure and motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional frequent, (2) subtle needs be measured over space large enough host group, (3) human appearance configuration variation immense. Panoptic Studio system organized around thesis that should through perceptual integration variety view points. modularized designed this principle, consisting integrated...
We present an approach to capture the 3D motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional and frequent; (2) subtle needs be measured over space large enough host group; (3) human appearance configuration variation immense; (4) attaching markers body may prime nature interactions. Panoptic Studio system organized around thesis that should through integration perceptual analyses variety view points. modularized...
Recognition of materials from their visual appearance is essential for computer vision tasks, especially those that involve interaction with the real world. Material segmentation, i.e., dense per-pixel recognition materials, remains challenging as, unlike objects, do not exhibit clearly discernible signatures in regular RGB appearances. Different however, lead to different radiometric behaviors, which can often be captured non-RGB imaging modalities. We realize multimodal material...
This paper is aimed at calibrating the relative posture and position, i.e. extrinsic parameters, of a stationary camera against 3D reference object which not directly visible from camera. We capture via mirror under three different unknown poses, then calibrate parameters 2D appearances reflections in mirrors. The key contribution this to present new algorithm returns unique solution P3P problems mirrored images. While each problem has up four solutions therefore set 64 solutions, our method...
We propose a novel method for spatiotemporal multi-camera calibration using freely moving people in multiview videos. Since calibrating multiple cameras and finding matches across their views are inherently interdependent, performing both unified framework poses significant challenge. address these issues as single registration problem of matching two sets 3D points, leveraging human motion dynamic multi-person scenes. To this end, we utilize obtained from an off-the-shelf monocular pose...
We introduce a novel method for recovering per-pixel surface normals from pair of polarization cameras. Unlike past methods that use polarimetric observations as auxiliary features correspondence matching, we fully integrate them in cost volume construction and filtering to directly recover normals, not byproducts recovered disparities. Our key idea is distance defined on the state computed normal. adapt belief propagation algorithm filter this volume. The simultaneously estimates...
We introduce a novel method and dataset for 3D gaze estimation of freely moving person from distance, typically in surveillance views. Eyes cannot be clearly seen such cases due to occlusion lacking resolution. Existing methods suffer or fall back approximating with head pose as they primarily rely on clear, close-up views the eyes. Our key idea is instead leverage intrinsic gaze, head, body coordination people. formulates Bayesian prediction given temporal estimates orientations which can...
This paper presents a novel approach to achieve accurate and complete multi-view reconstruction of dynamic scenes (or 3D videos). videos consist in sequences models motion captured by surrounding set video cameras. To date are reconstructed using multiview wide baseline stereo (MVS) techniques. However it is still tedious solve correspondence problems: accuracy falls when photo-consistency weak, completeness limited self-occlusions. Most MVS techniques were indeed designed deal with static...
This paper presents a new method to increase the quality of 3D video, media developed represent objects in motion. representation is obtained from multi-view reconstruction techniques that require images recorded simultaneously by several video cameras. All cameras are calibrated and placed around dedicated studio fully surround models. The limited quantity may produce inaccurate model with low texture. To overcome this issue, first we propose super-resolution (SR) for video: SR on...
Multiple-camera systems are currently in development as a means to capture and synthesise highly realistic three-dimensional (3D) video content.Studio for 3D production of human performance reviewed from the literature practical experience gained developing prototype studios is reported across two research laboratories.System design should consider studio backdrop foreground matting, lighting ambient illumination, camera configuration scene well accurate geometric photometric calibration.A...
This paper presents a novel semantic-based online extrinsic calibration approach, SOIC (so, I see), for Light Detection and Ranging (LiDAR) camera sensors. Previous methods usually need prior knowledge of rough initial values optimization. The proposed approach removes this limitation by converting the initialization problem to Perspective-n-Point (PnP) with introduction semantic centroids (SCs). closed-form solution PnP has been well researched can be found existing methods. Since centroid...
This paper is aimed at presenting a new virtual camera model which can efficiently refraction through flat housings in underwater photography. The key idea to employ pixel-wise focal length concept encode the refractive projection inside housing. radially-symmetric structure of varifocal around normal housing surface allows us with compact representation. We show that this realizes an efficient forward computation and linear extrinsic calibration water. Evaluations using synthesized real...
We introduce a novel multi-view stereo (MVS) method that can simultaneously recover not just per-pixel depth but also surface normals, together with the reflectance of textureless, complex non-Lambertian surfaces captured under known natural illumination. Our key idea is to formulate MVS as an end-to-end learnable network, which we refer nLMVS-Net, seamlessly integrates radiometric cues leverage normals view-independent features for learned cost volume construction and filtering. It first...
Neural Radiance Fields (NeRF) is a popular neural representation for novel view synthesis. By querying spatial points and directions, multilayer perceptron (MLP) can be trained to output the volume density radiance along ray, which lets us render views of scene. The original NeRF its recent variants, however, are limited opaque scenes dominated with diffuse reflection surfaces cannot handle complex refractive well. We introduce NeRFrac realize synthesis captured through surfaces, typically...
The sky exhibits a unique spatial polarization pattern by scattering the unpolarized sun light. Just like insects use this angular to navigate, we it map pixels directions on sky. That is, show that encoded in polarimetric appearance of an object captured under can be decoded reveal surface normal at each pixel. We derive reflection model diffuse plus mirror lit and clear This is used recover per-pixel from single image or multiple images different times day. experimentally evaluate accuracy...
We propose a novel camera calibration method for room-scale multi-view imaging system. Our key idea is to leverage our articulated body movements as target. show that freely moving person provides trajectories of set oriented points ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g</i> ., neck joint with spine direction) from which we can estimate the locations and poses all cameras observing them. The only requires be synced 2D human are...
We present an approach to capture the 3D motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional and frequent; (2) subtle needs be measured over space large enough host group; (3) human appearance configuration variation immense; (4) attaching markers body may prime nature interactions. Panoptic Studio system organized around thesis that should through integration perceptual analyses variety view points. modularized...
We introduce a novel neural network-based BRDF model and Bayesian framework for object inverse rendering, i.e., joint estimation of reflectance natural illumination from single image an known geometry. The is expressed with invertible network, namely, normalizing flow, which provides the expressive power high-dimensional representation, computational simplicity compact analytical model, physical plausibility real-world BRDF. extract latent space by conditioning this directly results in...