Mengyuan Yan

ORCID: 0000-0002-5427-4192
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Robot Manipulation and Learning
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • 3D Shape Modeling and Analysis
  • Advanced Neural Network Applications
  • Fire dynamics and safety research
  • Robotics and Sensor-Based Localization
  • Combustion and flame dynamics
  • Image Processing Techniques and Applications
  • Robotic Mechanisms and Dynamics
  • Computational Fluid Dynamics and Aerodynamics
  • Adversarial Robustness in Machine Learning
  • Machine Learning and Algorithms
  • Advanced SAR Imaging Techniques
  • Industrial Vision Systems and Defect Detection
  • Artificial Intelligence in Games
  • Advanced Manufacturing and Logistics Optimization
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Software Engineering Research
  • Domain Adaptation and Few-Shot Learning
  • Optical Systems and Laser Technology
  • Advanced Combustion Engine Technologies
  • Seismic Imaging and Inversion Techniques

Google (United States)
2023

Beihang University
2022-2023

Tianjin University
2020

Stanford University
2016-2020

Pennsylvania State University
1984-1986

3D shape models are becoming widely available and easier to capture, making information crucial for progress in object classification. Current state-of-theart methods rely on CNNs address this problem. Recently, we witness two types of being developed: based upon volumetric representations versus multi-view representations. Empirical results from these exhibit a large gap, indicating that existing CNN architectures approaches unable fully exploit the power In paper, aim improve both...

10.1109/cvpr.2016.609 article EN 2016-06-01

Large language models can encode a wealth of semantic knowledge about the world. Such could be extremely useful to robots aiming act upon high-level, temporally extended instructions expressed in natural language. However, significant weakness is that they lack real-world experience, which makes it difficult leverage them for decision making within given embodiment. For example, asking model describe how clean spill might result reasonable narrative, but may not applicable particular agent,...

10.48550/arxiv.2204.01691 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called MeteorNet learning representations point cloud sequences. Different from previous work that adopts grid-based representation applies or 4D convolutions, our directly processes clouds. two ways to construct spatiotemporal neighborhoods each in the sequence. Information these aggregated learn features per point. benchmark on variety of...

10.1109/iccv.2019.00934 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

We demonstrate model-based, visual robot manipulation of deformable linear objects. Our approach is based on a state-space representation the physical system that aims to control. This choice has multiple advantages, including ease incorporating physics priors in dynamics model and perception model, planning actions. In addition, states can naturally represent object instances different appearances. Therefore, state space be learned one setting directly used other visually settings. contrast...

10.1109/lra.2020.2969931 article EN IEEE Robotics and Automation Letters 2020-01-28

In the context of deep learning for robotics, we show effective method training a real robot to grasp tiny sphere (1.37cm diameter), with an original combination system design choices. We decompose end-to-end into vision module and closed-loop controller module. The two modules use target object segmentation as their common interface. extracts information from end-effector camera, in form binary mask target. train it achieve domain transfer by composing background images simulated takes...

10.48550/arxiv.1712.03303 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently policies from scratch can be very difficult, particularly tasks with exploration challenges. In such settings, it might desirable to initialize RL existing policy, offline data, or demonstrations. naively performing initialization in often works poorly, especially value-based methods. this paper, we present meta algorithm that use...

10.48550/arxiv.2204.02372 preprint EN other-oa arXiv (Cornell University) 2022-01-01

In this report, we proposed a 3D reconstruction method for the full-view fisheye camera. The camera used is Ricoh Theta, which captures spherical images and has wide field of view (FOV). conventional stereo apporach based on perspective model cannot be directly applied instead to depict relation between point its corresponding observation in image. We implemented system that can reconstruct scene using from two or more cameras. A GUI also created allow users control obtain better intuition...

10.48550/arxiv.1506.06273 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Many previous works approach vision-based robotic grasping by training a value network that evaluates grasp proposals. These approaches require an optimization process at run-time to infer the best action from network. As result, inference time grows exponentially as dimension of space increases. We propose alternative method, directly neural density model approximate conditional distribution successful poses input images. construct combines Gaussian mixture and normalizing flows, which is...

10.1109/icra.2019.8794024 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

We describe a system for deep reinforcement learning of robotic manipulation skills applied to large-scale realworld task: sorting recyclables and trash in office buildings.Real-world deployment RL policies requires not only effective training algorithms, but the ability bootstrap enable broad generalization.To this end, our combines scalable from real-world data with bootstrapping simulation, incorporates auxiliary inputs existing computer vision systems as way boost generalization novel...

10.15607/rss.2023.xix.022 article EN 2023-07-10

In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...

10.1109/iros45743.2020.9341330 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or reinforcement (RL) large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL reach a high level of performance, but requiresexploration, which very time consuming unsafe; IL does not requireexploration, only learns that are as good the provided demonstrations.Can single method combine strengths both approaches? A number ofprior aimed to address...

10.48550/arxiv.2111.05424 preprint EN public-domain arXiv (Cornell University) 2021-01-01

The study focuses on addressing the image defocusing issue caused by motion errors in highly squinted Synthetic Aperture Radar (SAR). traditional auto-focusing algorithm, Phase Gradient Autofocus (PGA), is not effective this mode due to difficulties estimating phase gradient accurately from strong point targets. Two main reasons contribute problem. Firstly, direction of energy-distributed lines Point Spread Function (PSF) does align with image’s azimuth mode. Secondly, wavenumber spectrum...

10.3390/rs15122976 article EN cc-by Remote Sensing 2023-06-07

10.1007/s10851-019-00943-7 article EN Journal of Mathematical Imaging and Vision 2020-01-23

One fundamental difficulty in robotic learning is the sim-real gap problem. In this work, we propose to use segmentation as interface between perception and control, a domain-invariant state representation. We identify two sources of gap, one dynamics other visual gap. To close closed-loop control. For complex task with mask input, further learn model-free control policy deep neural network using imitation learning. model real environment simulated target plus background image, without any...

10.48550/arxiv.2005.07695 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component recent deep is the target network which mitigates divergence when Q function. However, networks can slow down process due to delayed function updates. Our main contribution in this work self-regularized TD-learning method address without requiring network. Additionally, we propose self-guided policy improvement by combining...

10.48550/arxiv.2009.08973 preprint EN other-oa arXiv (Cornell University) 2020-01-01

3D shape models are becoming widely available and easier to capture, making information crucial for progress in object classification. Current state-of-the-art methods rely on CNNs address this problem. Recently, we witness two types of being developed: based upon volumetric representations versus multi-view representations. Empirical results from these exhibit a large gap, indicating that existing CNN architectures approaches unable fully exploit the power In paper, aim improve both...

10.48550/arxiv.1604.03265 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Many previous works approach vision-based robotic grasping by training a value network that evaluates grasp proposals. These approaches require an optimization process at run-time to infer the best action from network. As result, inference time grows exponentially as dimension of space increases. We propose alternative method, directly neural density model approximate conditional distribution successful poses input images. construct combines Gaussian mixture and normalizing flows, which is...

10.48550/arxiv.1904.07319 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We demonstrate model-based, visual robot manipulation of linear deformable objects. Our approach is based on a state-space representation the physical system that aims to control. This choice has multiple advantages, including ease incorporating physics priors in dynamics model and perception model, planning actions. In addition, states can naturally represent object instances different appearances. Therefore, state space be learned one setting directly used other visually settings. contrast...

10.48550/arxiv.1911.06283 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...

10.48550/arxiv.2009.02615 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We describe a system for deep reinforcement learning of robotic manipulation skills applied to large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment RL policies requires not only effective training algorithms, but the ability bootstrap enable broad generalization. To this end, our combines scalable from data with bootstrapping simulation, incorporates auxiliary inputs existing computer vision systems as way boost generalization novel objects,...

10.48550/arxiv.2305.03270 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called $MeteorNet$ learning representations point cloud sequences. Different from previous work that adopts grid-based representation applies or 4D convolutions, our directly processes clouds. two ways to construct spatiotemporal neighborhoods each in the sequence. Information these aggregated learn features per point. benchmark on variety of...

10.48550/arxiv.1910.09165 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...