Andy Zeng

ORCID: 0000-0002-4319-2159
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Robot Manipulation and Learning
  • Multimodal Machine Learning Applications
  • Reinforcement Learning in Robotics
  • Robotics and Sensor-Based Localization
  • Natural Language Processing Techniques
  • Advanced Vision and Imaging
  • Topic Modeling
  • Soft Robotics and Applications
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • 3D Shape Modeling and Analysis
  • Robotic Path Planning Algorithms
  • 3D Surveying and Cultural Heritage
  • Advanced Neural Network Applications
  • Muscle activation and electromyography studies
  • Advanced Image and Video Retrieval Techniques
  • Motor Control and Adaptation
  • Remote Sensing and LiDAR Applications
  • Robotic Mechanisms and Dynamics
  • Tactile and Sensory Interactions
  • Machine Learning and Data Classification
  • Modular Robots and Swarm Intelligence
  • Video Analysis and Summarization
  • Anomaly Detection Techniques and Applications
  • Adversarial Robustness in Machine Learning

Google (United States)
2019-2024

Princeton University
2016-2020

Carnegie Mellon University
2015

Access to large, diverse RGB-D datasets is critical for training scene understanding algorithms. However, existing still cover only a limited number of views or restricted scale spaces. In this paper, we introduce Matterport3D, large-scale dataset containing 10,800 panoramic from 194,400 images 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D 3D semantic segmentations. The precise global alignment comprehensive, set over entire buildings...

10.1109/3dv.2017.00081 article EN 2021 International Conference on 3D Vision (3DV) 2017-10-01

This paper focuses on semantic scene completion, a task for producing complete 3D voxel representation of volumetric occupancy and labels from single-view depth map observation. Previous work has considered completion labeling maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature tasks, introduce network (SSCNet), an end-to-end convolutional takes single image as input simultaneously outputs all voxels in camera view frustum. Our...

10.1109/cvpr.2017.28 preprint EN 2017-07-01

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit performance current state-of-art methods, which are typically based histograms over properties. In this paper, we present 3DMatch, data-driven model that learns volumetric patch descriptor for establishing correspondences between partial To amass training data our model, propose self-supervised feature learning...

10.1109/cvpr.2017.29 article EN 2017-07-01

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile grasping) actions: pushing can help rearrange cluttered objects to make space for arms fingers; likewise, grasping displace movements more precise collision-free. In this work, we demonstrate that it is possible discover learn these scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks map visual observations one...

10.1109/iros.2018.8593986 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

Robot warehouse automation has attracted significant interest in recent years, perhaps most visibly the Amazon Picking Challenge (APC) [1]. A fully autonomous pick-and-place system requires robust vision that reliably recognizes and locates objects amid cluttered environments, self-occlusions, sensor noise, a large variety of objects. In this paper we present an approach leverages multiview RGB-D data self-supervised, data-driven learning to overcome those difficulties. The was part...

10.1109/icra.2017.7989165 article EN 2017-05-01

This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known novel objects in cluttered environments. The key new feature the it handles wide range object categories without needing any task-specific training data for objects. To achieve this, first uses category-agnostic affordance prediction algorithm to select execute among four different primitive behaviors. It then recognizes picked with cross-domain image classification framework matches...

10.1109/icra.2018.8461044 preprint EN 2018-05-01

Access to large, diverse RGB-D datasets is critical for training scene understanding algorithms. However, existing still cover only a limited number of views or restricted scale spaces. In this paper, we introduce Matterport3D, large-scale dataset containing 10,800 panoramic from 194,400 images 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D 3D semantic segmentations. The precise global alignment comprehensive, set over entire buildings...

10.48550/arxiv.1709.06158 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises challenge grounding. We propose embodied to directly incorporate real-world continuous sensor modalities into and thereby establish link between words percepts. Input our model are multi-modal sentences that interleave visual, state estimation, textual input encodings. train these encodings end-to-end, conjunction with pre-trained large...

10.48550/arxiv.2303.03378 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We investigate whether a robot arm can learn to pick and throw arbitrary rigid objects into selected boxes quickly accurately. Throwing has the potential increase physical reachability picking speed of arm. However, precisely throwing in unstructured settings presents many challenges: from acquiring grasps suitable for reliable throwing, handling varying object-centric properties (e.g., mass distribution, friction, shape) complex aerodynamics. In this work, we propose an end-to-end...

10.1109/tro.2020.2988642 article EN cc-by IEEE Transactions on Robotics 2020-06-01

Large language models (LLMs) trained on code-completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can re-purposed write robot policy code, given natural commands. Specifically, code express functions or feedback loops process perception outputs (e.g., object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example commands (formatted comments) followed by...

10.1109/icra48891.2023.10160591 article EN 2023-05-29

Transparent objects are a common part of everyday life, yet they possess unique visual properties that make them incredibly difficult for standard 3D sensors to produce accurate depth estimates for. In many cases, often appear as noisy or distorted approximations the surfaces lie behind them. To address these challenges, we present ClearGrasp - deep learning approach estimating geometry transparent from single RGB-D image robotic manipulation. Given objects, uses convolutional networks infer...

10.1109/icra40945.2020.9197518 article EN 2020-05-01

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent understand many semantic aspects world: repertoire skills available, these influence world, changes world map back language. LLMs in environments need consider not just what do, but also when do them - answers that change over time response agent's own choices....

10.48550/arxiv.2207.05608 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react environment. However, due challenges collecting effective training data learning efficiently, most grasping algorithms today are limited top-down movements open-loop execution. In this work, we propose a new low-cost hardware interface for demonstrations by people in diverse environments. This makes it possible train robust end-to-end 6DoF...

10.1109/lra.2020.3004787 article EN IEEE Robotics and Automation Letters 2020-06-25

Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains generic, may only barely overlap. For example, visual-language (VLMs) Internet-scale image captions, but large language (LMs) further text with no images spreadsheets, SAT questions, code). As a result, store different forms commonsense knowledge across domains. In this work, we show that diversity is symbiotic, and can be leveraged through...

10.48550/arxiv.2204.00598 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e.g., image captions). While this is useful for matching images natural descriptions object goals, it remains disjoint from process mapping environment, so that lacks spatial precision classic geometric maps. To address problem, we propose VLMaps, map representation directly fuses features with 3D reconstruction physical world....

10.1109/icra48891.2023.10160969 article EN 2023-05-29

This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known novel objects in cluttered environments. The key new feature the it handles wide range object categories without needing any task-specific training data for objects. To achieve this, first uses an object-agnostic framework to map from visual observations actions: inferring dense pixel-wise probability maps affordances four different primitive actions. It then executes action with...

10.1177/0278364919868017 article EN cc-by-nc The International Journal of Robotics Research 2019-08-02

We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly accurately.Throwing has the potential increase physical reachability picking speed of arm.However, precisely throwing in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g.grasp object) handling varying object-centric properties (e.g.mass distribution, friction, shape) dynamics (e.g.aerodynamics).In this work, we propose an end-to-end...

10.15607/rss.2019.xv.004 preprint EN 2019-06-22

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved encompass an object, part or end effector. In this work, we propose Transporter Network, simple model architecture that rearranges deep features to infer displacements from visual input - which parameterize robot actions. It makes no assumptions objectness (e.g. canonical poses, models, keypoints), it exploits symmetries, and is orders magnitude more sample efficient than our...

10.48550/arxiv.2010.14406 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Is it possible to learn policies for robotic assembly that can generalize new objects? We explore this idea in the context of kit task. Since classic methods rely heavily on object pose estimation, they often struggle objects without 3D CAD models or task-specific training data. In work, we propose formulate task as a shape matching problem, where goal is descriptor establishes geometric correspondences between surfaces and their target placement locations from visual input. This formulation...

10.1109/icra40945.2020.9196733 article EN 2020-05-01

Rearranging and manipulating deformable objects such as cables, fabrics, bags is a long-standing challenge in robotic manipulation. The complex dynamics high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even goal specification. Goals cannot be easily specified object poses, may involve relative spatial relations "place the item inside bag". In this work, we develop suite simulated benchmarks with...

10.1109/icra48506.2021.9561391 article EN 2021-05-30

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied future scenarios. In this work, we investigate personalization of household cleanup with robots tidy up rooms by picking objects and putting them away. A key challenge is determining the proper place put each object, as people's vary greatly depending on personal taste or cultural background. instance, one person may prefer storing shirts in drawer, while another shelf....

10.1109/iros55552.2023.10341577 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023-10-01

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers also explored using LLMs advance the of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented LLM training corpora, existing efforts applying robotics largely treated as semantic planners or relied on human-engineered control primitives interface...

10.48550/arxiv.2306.08647 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning that may provide utility for robots, but remain prone confidently hallucinated predictions. In this work, we present KnowNo, which is framework measuring and aligning the uncertainty LLM-based planners such they know when don't ask help needed. KnowNo builds on theory conformal prediction statistical guarantees task completion while minimizing human in complex...

10.48550/arxiv.2307.01928 preprint EN cc-by arXiv (Cornell University) 2023-01-01

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding acknowledge a person glancing at them or saying "excuse me" pass people in busy corridor. We would like robots also demonstrate human-robot interaction. Prior work proposes rule-based methods that struggle scale new communication modalities social situations, while data-driven require specialized datasets for each situation the robot is used in. propose leverage rich...

10.1145/3610977.3634999 preprint EN other-oa 2024-03-10
Coming Soon ...