- Advanced Vision and Imaging
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Generative Adversarial Networks and Image Synthesis
- Visual perception and processing mechanisms
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- Neural Networks and Applications
- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- 3D Surveying and Cultural Heritage
- Visual Attention and Saliency Detection
- Image Processing and 3D Reconstruction
- Music and Audio Processing
- Remote Sensing and LiDAR Applications
- Advanced Numerical Analysis Techniques
- Robot Manipulation and Learning
- Model Reduction and Neural Networks
- Neural Networks and Reservoir Computing
- Face recognition and analysis
- Advanced Image Fusion Techniques
- Domain Adaptation and Few-Shot Learning
- Industrial Vision Systems and Defect Detection
- Olfactory and Sensory Function Studies
- Speech and Audio Processing
Massachusetts Institute of Technology
2017-2024
Moscow Institute of Thermal Technology
2022-2023
Stanford University
2017-2021
Stanford Medicine
2018
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit are incapable of modeling signals with fine detail, and fail to represent signal's spatial temporal derivatives, despite the fact that these essential physical defined implicitly solution partial differential equations. We propose...
Unsupervised learning with generative models has the potential of discovering rich representations 3D scenes. While geometric deep explored 3D-structure-aware scene geometry, these typically require explicit supervision. Emerging neural can be trained only posed 2D images, but existing methods ignore three-dimensional structure We propose Scene Representation Networks (SRNs), a continuous, representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions...
In this work, we address the lack of 3D understanding generative neural networks by introducing a persistent feature embedding for view synthesis. To end, propose DeepVoxels, learned representation that encodes view-dependent appearance scene without having to explicitly model its geometry. At core, our approach is based on Cartesian grid embedded features learn make use underlying structure. Our combines insights from geometric computer vision with recent advances in learning image-to-image...
Convolutional neural networks (CNNs) excel in a wide variety of computer vision applications, but their high performance also comes at computational cost. Despite efforts to increase efficiency both algorithmically and with specialized hardware, it remains difficult deploy CNNs embedded systems due tight power budgets. Here we explore complementary strategy that incorporates layer optical computing prior electronic computing, improving on image classification tasks while adding minimal cost...
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing reality (VR) content, developing new compression algorithms, or learning computational models of saliency visual attention. Whereas a body recent work has focused on modeling in desktop viewing conditions, VR very different from these conditions that behavior governed by stereoscopic vision and the complex interaction head orientation, gaze, other kinematic constraints. To...
In typical cameras the optical system is designed first; once it fixed, parameters in image processing algorithm are tuned to get good reproduction. contrast this sequential design approach, we consider joint optimization of an (for example, physical shape lens) together with reconstruction algorithm. We build a fully-differentiable simulation model that maps true source reconstructed one. The includes diffractive light propagation, depth and wavelength-dependent effects, noise...
Abstract Efficient rendering of photo‐realistic virtual worlds is a long standing effort computer graphics. Modern graphics techniques have succeeded in synthesizing images from hand‐crafted scene representations. However, the automatic generation shape, materials, lighting, and other aspects scenes remains challenging problem that, if solved, would make more widely accessible. Concurrently, progress vision machine learning given rise to new approach image synthesis editing, namely deep...
Abstract Recent advances in machine learning have led to increased interest solving visual computing problems using methods that employ coordinate‐based neural networks. These methods, which we call fields , parameterize physical properties of scenes or objects across space and time. They seen widespread success such as 3D shape image synthesis, animation human bodies, reconstruction, pose estimation. Rapid progress has numerous papers, but a consolidation the discovered knowledge not yet...
Abstract Synthesizing photo‐realistic images and videos is at the heart of computer graphics has been focus decades research. Traditionally, synthetic a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations geometry material properties input. Collectively, these inputs define actual what rendered, referred to representation (where consists one more objects). Example triangle meshes with accompanied textures (e.g.,...
Data is the driving force of machine learning, with amount and quality training data often being more important for performance a system than architecture details. But collecting, processing annotating real at scale difficult, expensive, frequently raises additional privacy, fairness legal concerns. Synthetic powerful tool potential to address these shortcomings: 1) it cheap 2) supports rich ground-truth annotations 3) offers full control over 4) can circumvent or mitigate problems regarding...
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing represented by NeRF is challenging, as the underlying connectionist representations such MLPs or voxel grids not object-centric compositional. In particular, it has been difficult to selectively edit specific regions objects. this work, we tackle problem of semantic decomposition NeRFs...
We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between a target (such as robot gripper or rack used for hanging) via category-level descriptors. employ this manipulation, where given task demonstration, we want to repeat the same on new instance from category. propose achieve objective by searching (via optimization) pose whose descriptor matches observed in demonstration. NDFs are conveniently trained self-supervised fashion...
A broad class of problems at the core computational imaging, sensing, and low-level computer vision reduces to inverse problem extracting latent images that follow a prior distribution, from measurements taken under known physical image formation model. Traditionally, hand-crafted priors along with iterative optimization methods have been used solve such problems. In this paper we present unrolled deep priors, principled framework for infusing knowledge into networks in inspired by classical...
Virtual reality systems are widely believed to be the next major computing platform. There are, however, some barriers adoption that must addressed, such as of motion sickness - which can lead undesirable symptoms including postural instability, headaches, and nausea. Motion in virtual occurs a result moving visual stimuli cause users perceive self-motion while they remain stationary real world. several contributing factors both this perception subsequent onset sickness, field view,...
Neural implicit shape representations are an emerging paradigm that offers many potential benefits over conventional discrete representations, including memory efficiency at a high spatial resolution. Generalizing across shapes with such neural amounts to learning priors the respective function space and enables geometry reconstruction from partial or noisy observations. Existing generalization methods rely on conditioning network low-dimensional latent code is either regressed by encoder...
Inferring representations of 3D scenes from 2D observations is a fundamental problem computer graphics, vision, and artificial intelligence. Emerging 3D-structured neural scene are promising approach to understanding. In this work, we propose novel representation, Light Field Networks or LFNs, which represent both geometry appearance the underlying in 360-degree, four-dimensional light field parameterized via implicit representation. Rendering ray an LFN requires only single network...
Loss functions for Neural Rendering Jun-Yan Zhu
We introduce a method for novel view synthesis given only single wide-baseline stereo image pair. In this challenging regime, 3D scene points are regularly observed once, requiring prior-based reconstruction of geometry and appearance. find that existing approaches to from sparse observations fail due recovering incorrect the high cost differentiable rendering precludes their scaling large-scale training. take step towards resolving these shortcomings by formulating multi-view transformer...
Traditional cinematography has relied for over a century on well-established set of editing rules, called continuity editing, to create sense situational continuity. Despite massive changes in visual content across cuts, viewers general experience no trouble perceiving the discontinuous flow information as coherent events. However, Virtual Reality (VR) movies are intrinsically different from traditional that viewer controls camera orientation at all times. As consequence, common techniques...
Real-world, imaging systems acquire measurements that are degraded by noise, optical aberrations, and other imperfections make image processing for human viewing higher-level perception tasks challenging. Conventional cameras address this problem compartmentalizing from high-level task processing. As such, conventional involves the RAW sensor in a sequential pipeline of steps, such as demosaicking, denoising, deblurring, tone-mapping, compression. This is optimized to obtain visually...
Denoising diffusion models are a powerful type of generative used to capture complex distributions real-world signals. However, their applicability is limited scenarios where training samples readily available, which not always the case in applications. For example, inverse graphics, goal generate from distribution 3D scenes that align with given image, but ground-truth unavailable and only 2D images accessible. To address this limitation, we propose novel class denoising probabilistic learn...