- Robotics and Sensor-Based Localization
- Robot Manipulation and Learning
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Robotic Path Planning Algorithms
- Video Surveillance and Tracking Methods
- Optical measurement and interference techniques
- Domain Adaptation and Few-Shot Learning
- Reinforcement Learning in Robotics
- Multimodal Machine Learning Applications
- Image and Object Detection Techniques
- Advanced Image Processing Techniques
- Image Processing Techniques and Applications
- 3D Surveying and Cultural Heritage
- Speech and Audio Processing
- Autonomous Vehicle Technology and Safety
- Advanced Image and Video Retrieval Techniques
- Image Processing and 3D Reconstruction
- Tactile and Sensory Interactions
- Remote Sensing and LiDAR Applications
- Computer Graphics and Visualization Techniques
- Hand Gesture Recognition Systems
- Music and Audio Processing
- Soft Robotics and Applications
Nvidia (United States)
2017-2024
Nvidia (United Kingdom)
2018-2024
Georgia Institute of Technology
2021
Istituto Tecnico Industriale Alessandro Volta
2021
Weatherford College
2021
Seattle University
2020
Microsoft (United States)
2013-2015
Clemson University
2005-2014
Aalto University
2011
Stanford University
1998-2002
An algorithm for tracking a person's head is presented. The head's projection onto the image plane modeled as an ellipse whose position and size are continually updated by local search combining output of module concentrating on intensity gradient around ellipse's perimeter with that another focusing color histogram interior. Since these two modules have roughly orthogonal failure modes, they serve to complement one another. result robust, real-time system able track enough accuracy...
We present a system for training deep neural networks object detection using synthetic images. To handle the variability in real-world data, relies upon technique of domain randomization, which parameters simulator-such as lighting, pose, textures, etc.-are randomized non-realistic ways to force network learn essential features interest. explore importance these parameters, showing that it is possible produce with compelling performance only non-artistically-generated data. With additional...
Because of image sampling, traditional measures pixel dissimilarity can assign a large value to two corresponding pixels in stereo pair, even the absence noise and other degrading effects. We propose measure that is provably insensitive sampling because it uses linearly interpolated intensity functions surrounding pixels. Experiments on real images show our alleviates problem with little additional computational overhead.
We introduce the concept of a spatiogram, which is generalization histogram that includes potentially higher order moments. A zeroth-order while second-order spatiograms contain spatial means and covariances for each bin. This information still allows quite general transformations, as in histogram, but captures richer description target to increase robustness tracking. show how use kernel-based trackers, deriving mean shift procedure individual pixels vote not only amount also its direction....
Urban traffic optimization using cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale camera dataset consisting of more than 3 hours synchronized HD videos from 40 across 10 intersections, with longest distance between two simultaneous being 2.5 km. To best our knowledge, CityFlow largest-scale in terms spatial coverage and number cameras/videos an urban environment. The contains 200K...
Using synthetic data for training deep neural networks robotic manipulation holds the promise of an almost unlimited amount pre-labeled data, generated safely out harm's way. One key challenges to date, has been bridge so-called reality gap, so that trained on operate correctly when exposed real-world data. We explore gap in context 6-DoF pose estimation known objects from a single RGB image. show this problem can be successfully spanned by simple combination domain randomized and...
We present structured domain randomization (SDR), a variant of (DR) that takes into account the structure scene in order to add context generated data. In contrast DR, which places objects and distractors randomly according uniform probability distribution, SDR distributions arise from specific problem at hand. this manner, SDR-generated imagery enables neural network take around an object consideration during detection. demonstrate power for 2D bounding box car detection, achieving...
We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests. The system introduces deep neural network (DNN) called TrailNet estimating the view orientation and lateral offset of MAV respect to trail center. DNN-based controller achieves stable flight without oscillations by avoiding overconfident behavior through loss function that includes both label smoothing entropy...
In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID received less attention. Vehicle is challenging due to 1) high intra-class variability (caused by dependency of shape and appearance on viewpoint), 2) small inter-class similarity between vehicles produced different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations...
We introduce DexYCB, a new dataset for capturing hand grasping of objects. first compare DexYCB with related one through cross-dataset evaluation. then present thorough benchmark state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D pose estimation, 3D estimation. Finally, we evaluate robotics-relevant task: generating safe robot grasps in human-to-robot handover. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
We present a near real-time (10Hz) method for 6-DoF tracking of an unknown object from monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction the object. Our works arbi-trary rigid objects, even when visual texture is largely ab-sent. The assumed to be segmented in first frame only. No additional information required, and no assumption made about interaction agent. Key our Neural Object Field that learned concurrently with pose graph optimization process...
DERVISH won the Office Delivery event of 1994 Robot Competition and Exhibition, held as part Thirteenth National Conferennce on Artificial Intelligence. Although contest required dervish to navigate in an artificial office environment, official goal was push technology robot navigation real buildings with minimal domain information. navigates reliably using retractable assumptions that simplify planning problem. In this article, we present a short description Dervish's hardware low-level...
Slanted surfaces pose a problem for correspondence algorithms utilizing search because of the greatly increased number possibilities, when compared with fronto-parallel surfaces. In this paper we propose an algorithm to compute between stereo images or frames motion sequence by minimizing energy functional that accounts slanted The is minimized in greedy strategy alternates segmenting image into non-overlapping regions (using multiway-cut Boykov, Veksler, and Zabih) finding affine parameters...
An algorithm to detect depth discontinuities from a stereo pair of images is presented. The matches individual pixels in corresponding scanline pairs while allowing occluded remain unmatched, then propagates the information between scanlines by means fast postprocessor. handles large untextured regions, uses measure pixel dissimilarity that insensitive image sampling, and prunes bad search nodes increase speed dynamic programming. computation relatively fast, taking about 1.5 microseconds...
We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation context of robotics.1 By synthetically combining models backgrounds complex composition high graphical quality, we are able to generate photorealistic images with accurate annotations all objects images. Our dataset contains 60k annotated photos 21 household taken from YCB [2]. For each image, provide poses, per-pixel class segmentation, 2D/3D bounding box...
We present a method for segmenting and tracking vehicles on highways using camera that is relatively low to the ground. At such angles, 3-D perspective effects cause significant changes in appearance over time, as well severe occlusions by neighboring lanes. Traditional approaches occlusion reasoning assume initially appear separated image; however, our sequences, it not uncommon enter scene partially occluded remain so throughout. By utilizing mapping from image, along with plumb line...
We present an approach to visual tracking based on dividing a target into multiple regions, or fragments. The is represented by Gaussian mixture model in joint feature-spatial space, with each ellipsoid corresponding different fragment. fragments are automatically adapted the image data, being selected efficient region-growing procedure and updated according weighted average of past statistics. Modeling background performed Chan-Vese manner, using framework level sets preserve accurate...
Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually offer reduced degrees control. Herein, a low-cost, depth-based system, DexPilot, was developed that allows complete control over full 23 DoA system by merely observing bare human hand. DexPilot...
We revisit the problem of visual depth estimation in context autonomous vehicles. Despite progress on monocular recent years, we show that gap between and stereo accuracy remains large-a particularly relevant result due to prevalent reliance upon cameras by vehicles are expected be self-driving. argue challenges removing this significant, owing fundamental limitations vision. As a result, focus our efforts stereo. propose novel semi-supervised learning approach training deep neural network,...
We present an approach for estimating the pose of external camera with respect to a robot using single RGB image robot. The is processed by deep neural network detect 2D projections keypoints (such as joints) associated trained entirely on simulated data domain randomization bridge reality gap. Perspective-n-point (PnP) then used recover extrinsics, assuming that intrinsics and joint configuration manipulator are known. Unlike classic hand-eye calibration systems, our method does not require...
We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our offers joint framework of neuro-symbolic task and low-level motion generation conditioned on the specified goal. At core our approach is two-level scene graph representation, namely geometric symbolic graph. This representation serves as structured, object-centric abstraction scenes. model uses neural networks to process these graphs predicting high-level plans motions. demonstrate that...