- Advanced Vision and Imaging
- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- 3D Surveying and Cultural Heritage
- Optical measurement and interference techniques
- Multimodal Machine Learning Applications
- Computer Graphics and Visualization Techniques
- Image Processing Techniques and Applications
- Reinforcement Learning in Robotics
- Image Retrieval and Classification Techniques
- Cellular transport and secretion
- Calcium signaling and nucleotide metabolism
- Consumer Market Behavior and Pricing
- Medical Image Segmentation Techniques
- Insurance, Mortality, Demography, Risk Management
- Autophagy in Disease and Therapy
- Domain Adaptation and Few-Shot Learning
- Visual Attention and Saliency Detection
- User Authentication and Security Systems
- Consumer Packaging Perceptions and Trends
- Digital Platforms and Economics
- Advanced Bandit Algorithms Research
Carnegie Mellon University
2023-2024
Nvidia (United States)
2020-2023
Nvidia (United Kingdom)
2020-2022
Weatherford College
2021
Toronto Metropolitan University
2021
Stanford University
2015-2020
Stanford Health Care
2018
In many robotics and VR/AR applications, 3D-videos are readily-available input sources (a sequence of depth images, or LIDAR scans). However, in cases, the processed frame-by-frame either through 2D convnets 3D perception algorithms. this work, we propose 4-dimensional convolutional neural networks for spatio-temporal that can directly process such using high-dimensional convolutions. For this, adopt sparse tensors generalized convolutions encompass all discrete To implement convolution,...
Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between also constitute rich semantic information about the scene. In this work, we explicitly model and their relationships using graphs, visually-grounded graphical structure of an image. We propose novel end-to-end that generates such structured representation from input Our key insight is graph generation problem can be formulated as message passing primal node its dual edge graph. joint...
We introduce a Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions multiple interacting agents in dynamic scenes. DESIRE effectively predicts locations objects scenes by 1) accounting multi-modal nature prediction (i.e., given same context, may vary), 2) foreseeing potential outcomes and make strategic based on that, 3) reasoning not only from past motion history, but also scene context as well interactions among agents. achieves these single...
3D semantic scene labeling is fundamental to agents operating in the real world. In particular, raw point sets from sensors provides fine-grained semantics. Recent works leverage capabilities of Neural Networks(NNs), but are limited coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework obtain point-level segmentation that combines advantages NNs, trilinear interpolation(TI) fully connected Conditional Random Fields (FC-CRF)....
Extracting geometric features from 3D scans or point clouds is the first step in applications such as registration, reconstruction, and tracking. State-of-the-art methods require computing low-level input extracting patch-based with limited receptive field. In this work, we present fully-convolutional features, computed a single pass by network. We also new metric learning losses that dramatically improve performance. Fully-convolutional are compact, capture broad spatial context, scale to...
We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans. global is based on three modules: 6-dimensional convolutional network correspondence confidence prediction, Weighted Procrustes algorithm closed-form pose estimation, and robust gradient-based SE(3) optimizer refinement. Experiments demonstrate that our approach outperforms state-of-the-art methods, both learning-based classical, data.
We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness both geometric semantic matching, spanning across rigid motions to intra-class shape or appearance variations. In contrast previous CNN-based approaches that optimize surrogate patch similarity objective, we use metric directly learn feature space preserves either similarity. Our fully convolutional architecture, along with novel correspondence contrastive loss allows faster training by...
In this work, we propose a camera self-calibration algorithm for generic cameras with arbitrary non-linear distortions. We jointly learn the geometry of scene and accurate parameters without any calibration objects. Our model consists pinhole model, fourth order radial distortion, noise that can While traditional algorithms mostly rely on geometric constraints, additionally incorporate photometric consistency. This requires learning scene, use Neural Radiance Fields (NeRF). also new loss...
Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that output volumetric semantics from only 2D images. Our adopts two-stage design where start sparse set visible occupied voxel queries depth estimation, followed by densification stage generates dense voxels ones. A key idea this...
3D reconstruction from a single image is key problem in multiple applications ranging robotic manipulation to augmented reality. Prior methods have tackled this through generative models which predict reconstructions as voxels or point clouds. However, these can be computationally expensive and miss fine details. We introduce new differentiable layer for data deformation use it DEFORMNET learn model reconstruction-through-deformation. takes an input, finds nearest shape template database,...
Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations 2D/3D data. In paper, we explore inexpensive 2D supervision as an alternative for expensive CAD annotation. Specifically, foreground masks weak raytrace pooling layer that enables perspective projection and backpropagation. Additionally, since from is ill posed problem, propose to constrain manifold unlabeled...
Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between also constitute rich semantic information about the scene. In this work, we explicitly model and their relationships using graphs, visually-grounded graphical structure of an image. We propose novel end-to-end that generates such structured representation from input The solves graph inference problem standard RNNs learns to iteratively improves its predictions via message passing. Our...
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose self-ensembling where are guided by structured teacher in addition to the The is energy model incorporating pairwise potential cross-image pixel relationships both within across boxes. Minimizing simultaneously yields refined object masks dense correspondences between intra-class objects, which taken as pseudo-labels supervise...
High-dimensional geometric patterns appear in many computer vision problems. In this work, we present high-dimensional convolutional networks for pattern recognition problems that arise 2D and 3D registration We first propose from 4 to 32 dimensions analyze the capacity linear regression Next, show correspondences form hyper-surface a 6-dimensional space validate our network on Finally, use image correspondences, which 4-dimensional hyper-conic section, are par with state-of-the-art...
Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, partial observability.We introduce ACID, an actionconditional visual dynamics model for based on structured implicit neural representations.ACID integrates two new techniques: representations action-conditional geodesics-based contrastive learning.To represent from RGB-D observations, we learn of occupancy flow-based...
A large body of recent work on object detection has focused exploiting 3D CAD model databases to improve performance. Many these approaches by aligning exact models images using templates generated from renderings the at a set discrete viewpoints. However, training procedures for are computationally expensive and require gigabytes memory storage, while viewpoint discretization hampers pose estimation We propose an efficient method synthesizing that runs fly - is, it quickly produces...