Stan Birchfield

ORCID: 0000-0001-7366-2441
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Robotics and Sensor-Based Localization
  • Robot Manipulation and Learning
  • Advanced Vision and Imaging
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Robotic Path Planning Algorithms
  • Video Surveillance and Tracking Methods
  • Optical measurement and interference techniques
  • Domain Adaptation and Few-Shot Learning
  • Reinforcement Learning in Robotics
  • Multimodal Machine Learning Applications
  • Image and Object Detection Techniques
  • Advanced Image Processing Techniques
  • Image Processing Techniques and Applications
  • 3D Surveying and Cultural Heritage
  • Speech and Audio Processing
  • Autonomous Vehicle Technology and Safety
  • Advanced Image and Video Retrieval Techniques
  • Image Processing and 3D Reconstruction
  • Tactile and Sensory Interactions
  • Remote Sensing and LiDAR Applications
  • Computer Graphics and Visualization Techniques
  • Hand Gesture Recognition Systems
  • Music and Audio Processing
  • Soft Robotics and Applications

Nvidia (United States)
2017-2024

Nvidia (United Kingdom)
2018-2024

Georgia Institute of Technology
2021

Istituto Tecnico Industriale Alessandro Volta
2021

Weatherford College
2021

Seattle University
2020

Microsoft (United States)
2013-2015

Clemson University
2005-2014

Aalto University
2011

Stanford University
1998-2002

An algorithm for tracking a person's head is presented. The head's projection onto the image plane modeled as an ellipse whose position and size are continually updated by local search combining output of module concentrating on intensity gradient around ellipse's perimeter with that another focusing color histogram interior. Since these two modules have roughly orthogonal failure modes, they serve to complement one another. result robust, real-time system able track enough accuracy...

10.1109/cvpr.1998.698614 article EN 2002-11-27

We present a system for training deep neural networks object detection using synthetic images. To handle the variability in real-world data, relies upon technique of domain randomization, which parameters simulator-such as lighting, pose, textures, etc.-are randomized non-realistic ways to force network learn essential features interest. explore importance these parameters, showing that it is possible produce with compelling performance only non-artistically-generated data. With additional...

10.1109/cvprw.2018.00143 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

Because of image sampling, traditional measures pixel dissimilarity can assign a large value to two corresponding pixels in stereo pair, even the absence noise and other degrading effects. We propose measure that is provably insensitive sampling because it uses linearly interpolated intensity functions surrounding pixels. Experiments on real images show our alleviates problem with little additional computational overhead.

10.1109/34.677269 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 1998-04-01

10.1023/a:1008160311296 article EN International Journal of Computer Vision 1999-01-01

We introduce the concept of a spatiogram, which is generalization histogram that includes potentially higher order moments. A zeroth-order while second-order spatiograms contain spatial means and covariances for each bin. This information still allows quite general transformations, as in histogram, but captures richer description target to increase robustness tracking. show how use kernel-based trackers, deriving mean shift procedure individual pixels vote not only amount also its direction....

10.1109/cvpr.2005.330 article EN 2005-07-27

Urban traffic optimization using cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale camera dataset consisting of more than 3 hours synchronized HD videos from 40 across 10 intersections, with longest distance between two simultaneous being 2.5 km. To best our knowledge, CityFlow largest-scale in terms spatial coverage and number cameras/videos an urban environment. The contains 200K...

10.1109/cvpr.2019.00900 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Using synthetic data for training deep neural networks robotic manipulation holds the promise of an almost unlimited amount pre-labeled data, generated safely out harm's way. One key challenges to date, has been bridge so-called reality gap, so that trained on operate correctly when exposed real-world data. We explore gap in context 6-DoF pose estimation known objects from a single RGB image. show this problem can be successfully spanned by simple combination domain randomized and...

10.48550/arxiv.1809.10790 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We present structured domain randomization (SDR), a variant of (DR) that takes into account the structure scene in order to add context generated data. In contrast DR, which places objects and distractors randomly according uniform probability distribution, SDR distributions arise from specific problem at hand. this manner, SDR-generated imagery enables neural network take around an object consideration during detection. demonstrate power for 2D bounding box car detection, achieving...

10.1109/icra.2019.8794443 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests. The system introduces deep neural network (DNN) called TrailNet estimating the view orientation and lateral offset of MAV respect to trail center. DNN-based controller achieves stable flight without oscillations by avoiding overconfident behavior through loss function that includes both label smoothing entropy...

10.1109/iros.2017.8206285 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01

In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID received less attention. Vehicle is challenging due to 1) high intra-class variability (caused by dependency of shape and appearance on viewpoint), 2) small inter-class similarity between vehicles produced different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations...

10.1109/iccv.2019.00030 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

We introduce DexYCB, a new dataset for capturing hand grasping of objects. first compare DexYCB with related one through cross-dataset evaluation. then present thorough benchmark state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D pose estimation, 3D estimation. Finally, we evaluate robotics-relevant task: generating safe robot grasps in human-to-robot handover. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>

10.1109/cvpr46437.2021.00893 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We present a near real-time (10Hz) method for 6-DoF tracking of an unknown object from monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction the object. Our works arbi-trary rigid objects, even when visual texture is largely ab-sent. The assumed to be segmented in first frame only. No additional information required, and no assumption made about interaction agent. Key our Neural Object Field that learned concurrently with pose graph optimization process...

10.1109/cvpr52729.2023.00066 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1109/cvpr52733.2024.01692 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

DERVISH won the Office Delivery event of 1994 Robot Competition and Exhibition, held as part Thirteenth National Conferennce on Artificial Intelligence. Although contest required dervish to navigate in an artificial office environment, official goal was push technology robot navigation real buildings with minimal domain information. navigates reliably using retractable assumptions that simplify planning problem. In this article, we present a short description Dervish's hardware low-level...

10.1609/aimag.v16i2.1133 article EN AI Magazine 1995-06-15

Slanted surfaces pose a problem for correspondence algorithms utilizing search because of the greatly increased number possibilities, when compared with fronto-parallel surfaces. In this paper we propose an algorithm to compute between stereo images or frames motion sequence by minimizing energy functional that accounts slanted The is minimized in greedy strategy alternates segmenting image into non-overlapping regions (using multiway-cut Boykov, Veksler, and Zabih) finding affine parameters...

10.1109/iccv.1999.791261 article EN 1999-01-01

An algorithm to detect depth discontinuities from a stereo pair of images is presented. The matches individual pixels in corresponding scanline pairs while allowing occluded remain unmatched, then propagates the information between scanlines by means fast postprocessor. handles large untextured regions, uses measure pixel dissimilarity that insensitive image sampling, and prunes bad search nodes increase speed dynamic programming. computation relatively fast, taking about 1.5 microseconds...

10.1109/iccv.1998.710850 article EN 2002-11-27

We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation context of robotics.1 By synthetically combining models backgrounds complex composition high graphical quality, we are able to generate photorealistic images with accurate annotations all objects images. Our dataset contains 60k annotated photos 21 household taken from YCB [2]. For each image, provide poses, per-pixel class segmentation, 2D/3D bounding box...

10.1109/cvprw.2018.00275 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

We present a method for segmenting and tracking vehicles on highways using camera that is relatively low to the ground. At such angles, 3-D perspective effects cause significant changes in appearance over time, as well severe occlusions by neighboring lanes. Traditional approaches occlusion reasoning assume initially appear separated image; however, our sequences, it not uncommon enter scene partially occluded remain so throughout. By utilizing mapping from image, along with plumb line...

10.1109/tits.2007.911357 article EN IEEE Transactions on Intelligent Transportation Systems 2008-02-29

We present an approach to visual tracking based on dividing a target into multiple regions, or fragments. The is represented by Gaussian mixture model in joint feature-spatial space, with each ellipsoid corresponding different fragment. fragments are automatically adapted the image data, being selected efficient region-growing procedure and updated according weighted average of past statistics. Modeling background performed Chan-Vese manner, using framework level sets preserve accurate...

10.1109/iccv.2009.5459276 article EN 2009-09-01

Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually offer reduced degrees control. Herein, a low-cost, depth-based system, DexPilot, was developed that allows complete control over full 23 DoA system by merely observing bare human hand. DexPilot...

10.1109/icra40945.2020.9197124 article EN 2020-05-01

We revisit the problem of visual depth estimation in context autonomous vehicles. Despite progress on monocular recent years, we show that gap between and stereo accuracy remains large-a particularly relevant result due to prevalent reliance upon cameras by vehicles are expected be self-driving. argue challenges removing this significant, owing fundamental limitations vision. As a result, focus our efforts stereo. propose novel semi-supervised learning approach training deep neural network,...

10.1109/cvprw.2018.00147 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

We present an approach for estimating the pose of external camera with respect to a robot using single RGB image robot. The is processed by deep neural network detect 2D projections keypoints (such as joints) associated trained entirely on simulated data domain randomization bridge reality gap. Perspective-n-point (PnP) then used recover extrinsics, assuming that intrinsics and joint configuration manipulator are known. Unlike classic hand-eye calibration systems, our method does not require...

10.1109/icra40945.2020.9196596 article EN 2020-05-01

We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our offers joint framework of neuro-symbolic task and low-level motion generation conditioned on the specified goal. At core our approach is two-level scene graph representation, namely geometric symbolic graph. This representation serves as structured, object-centric abstraction scenes. model uses neural networks to process these graphs predicting high-level plans motions. demonstrate that...

10.1109/icra48506.2021.9561548 article EN 2021-05-30
Coming Soon ...