- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
- Advanced Vision and Imaging
- 3D Surveying and Cultural Heritage
- Advanced Image and Video Retrieval Techniques
- Remote Sensing and LiDAR Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Thyroid and Parathyroid Surgery
- Microfluidic and Bio-sensing Technologies
- Digital Holography and Microscopy
- Image Retrieval and Classification Techniques
- Congenital Heart Disease Studies
- Adversarial Robustness in Machine Learning
- Cognitive and developmental aspects of mathematical skills
- Cardiovascular Syncope and Autonomic Disorders
- Design Education and Practice
- Microfluidic and Capillary Electrophoresis Applications
- Creativity in Education and Neuroscience
- Cardiac Arrhythmias and Treatments
- Anomaly Detection Techniques and Applications
- Advanced Image Processing Techniques
- Explainable Artificial Intelligence (XAI)
University of Michigan
2020-2024
Istituto Tecnico Industriale Alessandro Volta
2021
Weatherford College
2021
Georgia Institute of Technology
2015-2016
University of California, San Francisco
2016
Aligning partial views of a scene into single whole is essential to understanding one’s environment and key component numerous robotics tasks such as SLAM SfM. Recent approaches have proposed end-to-end systems that can outperform traditional methods by leveraging pose supervision. However, with the rising prevalence cameras depth sensors, we expect new stream raw RGB-D data without annotations needed for We propose UnsupervisedR&R: an unsupervised approach learning point cloud registration...
Geometric feature extraction is a crucial component of point cloud registration pipelines. Recent work has demonstrated how supervised learning can be leveraged to learn better and more compact 3D features. However, those approaches' reliance on ground-truth annotation limits their scalability. We propose BYOC: self-supervised approach that learns visual geometric features from RGB-D video without relying pose or correspondence. Our key observation randomly-initialized CNNs readily provide...
Although an object may appear in numerous contexts, we often describe it a limited number of ways. Language allows us to abstract away visual variation represent and communicate concepts. Building on this intuition, propose alternative approach representation learning: using language similarity sample semantically similar image pairs for contrastive learning. Our diverges from image-based learning by sampling view instead handcrafted augmentations or learned clusters. also differs image-text...
The goal of this paper is to estimate the viewpoint for a novel object. Standard estimation approaches generally fail on task due their reliance 3D model alignment or large amounts class-specific training data and corresponding canonical pose. We overcome those limitations by learning reconstruct align approach. Our key insight that although we do not have an explicit predefined pose, can still learn object's shape in viewer's frame then use image provide our reference In particular, propose...
Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from closeby frame pairs. However, by only relying on close-by pairs, those miss out richer long-range between distant overlapping frames. To address this, we propose a self-supervised approach that learns multiview in short RGB-D video sequences. Our combines pairwise and registration novel SE(3) transformation synchronization...
Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent generalize to arbitrary images for their training task, intermediate representations are useful other tasks such as detection and segmentation. Given that classify, delineate, localize objects 2D, we ask whether they also represent 3D structure? In this work, analyze the awareness of models. We posit implies (1) encode structure scene (2) consistently surface across...
Humans have an unparalleled visual intelligence and can overcome ambiguities that machines currently cannot. Recent works shown incorporating guidance from humans during inference for monocular viewpoint-estimation help difficult cases in which the computer-alone would otherwise failed. These hybrid approaches are hence gaining traction. However, deciding what question to ask human at time remains unknown these problems. We address this by formulating it as Adviser Problem: we learn a...
Although an object may appear in numerous contexts, we often describe it a limited number of ways. Language allows us to abstract away visual variation represent and communicate concepts. Building on this intuition, propose alternative approach representation learning: using language similarity sample semantically similar image pairs for contrastive learning. Our diverges from image-based learning by sampling view instead hand-crafted augmentations or learned clusters. also differs...
The goal of this paper is to estimate the viewpoint for a novel object. Standard estimation approaches generally fail on task due their reliance 3D model alignment or large amounts class-specific training data and corresponding canonical pose. We overcome those limitations by learning reconstruct align approach. Our key insight that although we do not have an explicit predefined pose, can still learn object's shape in viewer's frame then use image provide our reference In particular, propose...
Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on pairs, those miss out richer long-range between distant overlapping frames. To address this, we propose a self-supervised approach that learns multiview in short RGB-D video sequences. Our combines pairwise and registration novel SE(3) transformation synchronization algorithm....
Geometric feature extraction is a crucial component of point cloud registration pipelines. Recent work has demonstrated how supervised learning can be leveraged to learn better and more compact 3D features. However, those approaches' reliance on ground-truth annotation limits their scalability. We propose BYOC: self-supervised approach that learns visual geometric features from RGB-D video without relying pose or correspondence. Our key observation randomly-initialized CNNs readily provide...
Aligning partial views of a scene into single whole is essential to understanding one's environment and key component numerous robotics tasks such as SLAM SfM. Recent approaches have proposed end-to-end systems that can outperform traditional methods by leveraging pose supervision. However, with the rising prevalence cameras depth sensors, we expect new stream raw RGB-D data without annotations needed for We propose UnsupervisedR&R: an unsupervised approach learning point cloud registration...