NFDI4DS | UHH-SEMS - Publication Details

Matterport3D: Learning from RGB-D Data in Indoor Environments

OPENALEX - Publications

Anne Lynn S. Chang Angela Dai Thomas Funkhouser Maciej Halber Matthias NieBner and 4 more

Access to large, diverse RGB-D datasets is critical for training scene understanding algorithms. However, existing still cover only a limited number of views or restricted scale spaces. In this paper, we introduce Matterport3D, large-scale dataset containing 10,800 panoramic from 194,400 images 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D 3D semantic segmentations. The precise global alignment comprehensive, set over entire buildings...

10.1109/3dv.2017.00081 article EN 2021 International Conference on 3D Vision (3DV) 2017-10-01

Semantic Scene Completion from a Single Depth Image

OPENALEX - Publications

Shuran Song Fisher Yu Andy Zeng Anne Lynn S. Chang Manolis Savva and 1 more

This paper focuses on semantic scene completion, a task for producing complete 3D voxel representation of volumetric occupancy and labels from single-view depth map observation. Previous work has considered completion labeling maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature tasks, introduce network (SSCNet), an end-to-end convolutional takes single image as input simultaneously outputs all voxels in camera view frustum. Our...

10.1109/cvpr.2017.28 preprint EN 2017-07-01

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

OPENALEX - Publications

Andy Zeng Shuran Song Matthias Nießner Matthew Fisher Jianxiong Xiao and 1 more

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit performance current state-of-art methods, which are typically based histograms over properties. In this paper, we present 3DMatch, data-driven model that learns volumetric patch descriptor for establishing correspondences between partial To amass training data our model, propose self-supervised feature learning...

10.1109/cvpr.2017.29 article EN 2017-07-01

Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning

OPENALEX - Publications

Andy Zeng Shuran Song Stefan Welker Johnny Lee Alberto Rodríguez and 1 more

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile grasping) actions: pushing can help rearrange cluttered objects to make space for arms fingers; likewise, grasping displace movements more precise collision-free. In this work, we demonstrate that it is possible discover learn these scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks map visual observations one...

10.1109/iros.2018.8593986 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

OPENALEX - Publications

Andy Zeng Kuan‐Ting Yu Shuran Song Daniel Suo Ed Walker and 2 more

Robot warehouse automation has attracted significant interest in recent years, perhaps most visibly the Amazon Picking Challenge (APC) [1]. A fully autonomous pick-and-place system requires robust vision that reliably recognizes and locates objects amid cluttered environments, self-occlusions, sensor noise, a large variety of objects. In this paper we present an approach leverages multiview RGB-D data self-supervised, data-driven learning to overcome those difficulties. The was part...

10.1109/icra.2017.7989165 article EN 2017-05-01

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

OPENALEX - Publications

Andy Zeng Shuran Song Kuan‐Ting Yu Elliott Donlon Francois R. Hogan and 16 more

This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known novel objects in cluttered environments. The key new feature the it handles wide range object categories without needing any task-specific training data for objects. To achieve this, first uses category-agnostic affordance prediction algorithm to select execute among four different primitive behaviors. It then recognizes picked with cross-domain image classification framework matches...

10.1109/icra.2018.8461044 preprint EN 2018-05-01

Matterport3D: Learning from RGB-D Data in Indoor Environments

OPENALEX - Publications

Anne Lynn S. Chang Angela Dai Thomas Funkhouser Maciej Halber Matthias Nießner and 4 more

Access to large, diverse RGB-D datasets is critical for training scene understanding algorithms. However, existing still cover only a limited number of views or restricted scale spaces. In this paper, we introduce Matterport3D, large-scale dataset containing 10,800 panoramic from 194,400 images 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D 3D semantic segmentations. The precise global alignment comprehensive, set over entire buildings...

10.48550/arxiv.1709.06158 preprint EN other-oa arXiv (Cornell University) 2017-01-01

PaLM-E: An Embodied Multimodal Language Model

OPENALEX - Publications

Danny Driess Fei Xia Mehdi S. M. Sajjadi Corey Lynch Aakanksha Chowdhery and 17 more

Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises challenge grounding. We propose embodied to directly incorporate real-world continuous sensor modalities into and thereby establish link between words percepts. Input our model are multi-modal sentences that interleave visual, state estimation, textual input encodings. train these encodings end-to-end, conjunction with pre-trained large...

10.48550/arxiv.2303.03378 preprint EN other-oa arXiv (Cornell University) 2023-01-01

TossingBot: Learning to Throw Arbitrary Objects With Residual Physics

OPENALEX - Publications

Andy Zeng Shuran Song Johnny Lee Alberto Rodríguez Thomas Funkhouser

We investigate whether a robot arm can learn to pick and throw arbitrary rigid objects into selected boxes quickly accurately. Throwing has the potential increase physical reachability picking speed of arm. However, precisely throwing in unstructured settings presents many challenges: from acquiring grasps suitable for reliable throwing, handling varying object-centric properties (e.g., mass distribution, friction, shape) complex aerodynamics. In this work, we propose an end-to-end...

10.1109/tro.2020.2988642 article EN cc-by IEEE Transactions on Robotics 2020-06-01

Code as Policies: Language Model Programs for Embodied Control

OPENALEX - Publications

Jacky Liang Wenlong Huang Fei Xia Peng Xu Karol Hausman and 3 more

Large language models (LLMs) trained on code-completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can re-purposed write robot policy code, given natural commands. Specifically, code express functions or feedback loops process perception outputs (e.g., object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example commands (formatted comments) followed by...

10.1109/icra48891.2023.10160591 article EN 2023-05-29

Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation

OPENALEX - Publications

Shreeyak S. Sajjan Matthew R. Moore Mike Pan Ganesh Nagaraja Johnny Lee and 2 more

Transparent objects are a common part of everyday life, yet they possess unique visual properties that make them incredibly difficult for standard 3D sensors to produce accurate depth estimates for. In many cases, often appear as noisy or distorted approximations the surfaces lie behind them. To address these challenges, we present ClearGrasp - deep learning approach estimating geometry transparent from single RGB-D image robotic manipulation. Given objects, uses convolutional networks infer...

10.1109/icra40945.2020.9197518 article EN 2020-05-01

Inner Monologue: Embodied Reasoning through Planning with Language Models

OPENALEX - Publications

Wenlong Huang Fei Xia Ted Xiao Harris Chan Jacky Liang and 12 more

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent understand many semantic aspects world: repertoire skills available, these influence world, changes world map back language. LLMs in environments need consider not just what do, but also when do them - answers that change over time response agent's own choices....

10.48550/arxiv.2207.05608 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations

OPENALEX - Publications

Shuran Song Andy Zeng Johnny Lee Thomas Funkhouser

Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react environment. However, due challenges collecting effective training data learning efficiently, most grasping algorithms today are limited top-down movements open-loop execution. In this work, we propose a new low-cost hardware interface for demonstrations by people in diverse environments. This makes it possible train robust end-to-end 6DoF...

10.1109/lra.2020.3004787 article EN IEEE Robotics and Automation Letters 2020-06-25

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

OPENALEX - Publications

Andy Zeng Adrian Wong Stefan Welker Krzysztof Choromański Federico Tombari and 6 more

Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains generic, may only barely overlap. For example, visual-language (VLMs) Internet-scale image captions, but large language (LMs) further text with no images spreadsheets, SAT questions, code). As a result, store different forms commonsense knowledge across domains. In this work, we show that diversity is symbiotic, and can be leveraged through...

10.48550/arxiv.2204.00598 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Visual Language Maps for Robot Navigation

OPENALEX - Publications

Chenguang Huang Oier Mees Andy Zeng Wolfram Burgard

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e.g., image captions). While this is useful for matching images natural descriptions object goals, it remains disjoint from process mapping environment, so that lacks spatial precision classic geometric maps. To address problem, we propose VLMaps, map representation directly fuses features with 3D reconstruction physical world....

10.1109/icra48891.2023.10160969 article EN 2023-05-29

TidyBot: personalized robot assistance with large language models

OPENALEX - Publications

Jimmy Wu Rika Antonova Adam Kan Marion Lepert Andy Zeng and 4 more

10.1007/s10514-023-10139-z article EN Autonomous Robots 2023-11-16

Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

OPENALEX - Publications

Andy Zeng Shuran Song Kuan‐Ting Yu Elliott Donlon Francois R. Hogan and 16 more

This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known novel objects in cluttered environments. The key new feature the it handles wide range object categories without needing any task-specific training data for objects. To achieve this, first uses an object-agnostic framework to map from visual observations actions: inferring dense pixel-wise probability maps affordances four different primitive actions. It then executes action with...

10.1177/0278364919868017 article EN cc-by-nc The International Journal of Robotics Research 2019-08-02

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

OPENALEX - Publications

Andy Zeng Shuran Song Johnny Lee A. Rodriquez Thomas A.Funkouser

We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly accurately.Throwing has the potential increase physical reachability picking speed of arm.However, precisely throwing in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g.grasp object) handling varying object-centric properties (e.g.mass distribution, friction, shape) dynamics (e.g.aerodynamics).In this work, we propose an end-to-end...

10.15607/rss.2019.xv.004 preprint EN 2019-06-22

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

OPENALEX - Publications

Andy Zeng Pete Florence Jonathan Tompson Stefan Welker Jonathan Chien and 6 more

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved encompass an object, part or end effector. In this work, we propose Transporter Network, simple model architecture that rearranges deep features to infer displacements from visual input - which parameterize robot actions. It makes no assumptions objectness (e.g. canonical poses, models, keypoints), it exploits symmetries, and is orders magnitude more sample efficient than our...

10.48550/arxiv.2010.14406 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

OPENALEX - Publications

Kevin Zakka Andy Zeng Johnny Lee Shuran Song

Is it possible to learn policies for robotic assembly that can generalize new objects? We explore this idea in the context of kit task. Since classic methods rely heavily on object pose estimation, they often struggle objects without 3D CAD models or task-specific training data. In work, we propose formulate task as a shape matching problem, where goal is descriptor establishes geometric correspondences between surfaces and their target placement locations from visual input. This formulation...

10.1109/icra40945.2020.9196733 article EN 2020-05-01

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

OPENALEX - Publications

Daniel Seita Pete Florence Jonathan Tompson Erwin Coumans Vikas Sindhwani and 2 more

Rearranging and manipulating deformable objects such as cables, fabrics, bags is a long-standing challenge in robotic manipulation. The complex dynamics high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even goal specification. Goals cannot be easily specified object poses, may involve relative spatial relations "place the item inside bag". In this work, we develop suite simulated benchmarks with...

10.1109/icra48506.2021.9561391 article EN 2021-05-30

TidyBot: Personalized Robot Assistance with Large Language Models

OPENALEX - Publications

Jimmy Wu Rika Antonova Adam Kan Marion Lepert Andy Zeng and 4 more

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied future scenarios. In this work, we investigate personalization of household cleanup with robots tidy up rooms by picking objects and putting them away. A key challenge is determining the proper place put each object, as people's vary greatly depending on personal taste or cultural background. instance, one person may prefer storing shirts in drawer, while another shelf....

10.1109/iros55552.2023.10341577 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023-10-01

Language to Rewards for Robotic Skill Synthesis

OPENALEX - Publications

Wenhao Yu Nimrod Gileadi Chuyuan Fu Sean Kirmani Kuang-Huei Lee and 15 more

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers also explored using LLMs advance the of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented LLM training corpora, existing efforts applying robotics largely treated as semantic planners or relied on human-engineered control primitives interface...

10.48550/arxiv.2306.08647 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

OPENALEX - Publications

Allen Z. Ren Anushri Dixit A. A. Bodrova Sumeet Singh Stephen Tu and 9 more

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning that may provide utility for robots, but remain prone confidently hallucinated predictions. In this work, we present KnowNo, which is framework measuring and aligning the uncertainty LLM-based planners such they know when don't ask help needed. KnowNo builds on theory conformal prediction statistical guarantees task completion while minimizing human in complex...

10.48550/arxiv.2307.01928 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Generative Expressive Robot Behaviors using Large Language Models

OPENALEX - Publications

Karthik Mahadevan Jonathan Chien Noah Brown Zhuo Xu Carolina Parada and 4 more

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding acknowledge a person glancing at them or saying "excuse me" pass people in busy corridor. We would like robots also demonstrate human-robot interaction. Prior work proposes rule-based methods that struggle scale new communication modalities social situations, while data-driven require specialized datasets for each situation the robot is used in. propose leverage rich...

10.1145/3610977.3634999 preprint EN other-oa 2024-03-10