- Reinforcement Learning in Robotics
- Robot Manipulation and Learning
- Advanced Vision and Imaging
- Human Pose and Action Recognition
- 3D Shape Modeling and Analysis
- Advanced Neural Network Applications
- Fire dynamics and safety research
- Robotics and Sensor-Based Localization
- Combustion and flame dynamics
- Image Processing Techniques and Applications
- Robotic Mechanisms and Dynamics
- Computational Fluid Dynamics and Aerodynamics
- Adversarial Robustness in Machine Learning
- Machine Learning and Algorithms
- Advanced SAR Imaging Techniques
- Industrial Vision Systems and Defect Detection
- Artificial Intelligence in Games
- Advanced Manufacturing and Logistics Optimization
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Software Engineering Research
- Domain Adaptation and Few-Shot Learning
- Optical Systems and Laser Technology
- Advanced Combustion Engine Technologies
- Seismic Imaging and Inversion Techniques
Google (United States)
2023
Beihang University
2022-2023
Tianjin University
2020
Stanford University
2016-2020
Pennsylvania State University
1984-1986
3D shape models are becoming widely available and easier to capture, making information crucial for progress in object classification. Current state-of-theart methods rely on CNNs address this problem. Recently, we witness two types of being developed: based upon volumetric representations versus multi-view representations. Empirical results from these exhibit a large gap, indicating that existing CNN architectures approaches unable fully exploit the power In paper, aim improve both...
Large language models can encode a wealth of semantic knowledge about the world. Such could be extremely useful to robots aiming act upon high-level, temporally extended instructions expressed in natural language. However, significant weakness is that they lack real-world experience, which makes it difficult leverage them for decision making within given embodiment. For example, asking model describe how clean spill might result reasonable narrative, but may not applicable particular agent,...
Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called MeteorNet learning representations point cloud sequences. Different from previous work that adopts grid-based representation applies or 4D convolutions, our directly processes clouds. two ways to construct spatiotemporal neighborhoods each in the sequence. Information these aggregated learn features per point. benchmark on variety of...
We demonstrate model-based, visual robot manipulation of deformable linear objects. Our approach is based on a state-space representation the physical system that aims to control. This choice has multiple advantages, including ease incorporating physics priors in dynamics model and perception model, planning actions. In addition, states can naturally represent object instances different appearances. Therefore, state space be learned one setting directly used other visually settings. contrast...
In the context of deep learning for robotics, we show effective method training a real robot to grasp tiny sphere (1.37cm diameter), with an original combination system design choices. We decompose end-to-end into vision module and closed-loop controller module. The two modules use target object segmentation as their common interface. extracts information from end-effector camera, in form binary mask target. train it achieve domain transfer by composing background images simulated takes...
Reinforcement learning (RL) provides a theoretical framework for continuously improving an agent's behavior via trial and error. However, efficiently policies from scratch can be very difficult, particularly tasks with exploration challenges. In such settings, it might desirable to initialize RL existing policy, offline data, or demonstrations. naively performing initialization in often works poorly, especially value-based methods. this paper, we present meta algorithm that use...
In this report, we proposed a 3D reconstruction method for the full-view fisheye camera. The camera used is Ricoh Theta, which captures spherical images and has wide field of view (FOV). conventional stereo apporach based on perspective model cannot be directly applied instead to depict relation between point its corresponding observation in image. We implemented system that can reconstruct scene using from two or more cameras. A GUI also created allow users control obtain better intuition...
Many previous works approach vision-based robotic grasping by training a value network that evaluates grasp proposals. These approaches require an optimization process at run-time to infer the best action from network. As result, inference time grows exponentially as dimension of space increases. We propose alternative method, directly neural density model approximate conditional distribution successful poses input images. construct combines Gaussian mixture and normalizing flows, which is...
We describe a system for deep reinforcement learning of robotic manipulation skills applied to large-scale realworld task: sorting recyclables and trash in office buildings.Real-world deployment RL policies requires not only effective training algorithms, but the ability bootstrap enable broad generalization.To this end, our combines scalable from real-world data with bootstrapping simulation, incorporates auxiliary inputs existing computer vision systems as way boost generalization novel...
In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...
Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or reinforcement (RL) large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL reach a high level of performance, but requiresexploration, which very time consuming unsafe; IL does not requireexploration, only learns that are as good the provided demonstrations.Can single method combine strengths both approaches? A number ofprior aimed to address...
The study focuses on addressing the image defocusing issue caused by motion errors in highly squinted Synthetic Aperture Radar (SAR). traditional auto-focusing algorithm, Phase Gradient Autofocus (PGA), is not effective this mode due to difficulties estimating phase gradient accurately from strong point targets. Two main reasons contribute problem. Firstly, direction of energy-distributed lines Point Spread Function (PSF) does align with image’s azimuth mode. Secondly, wavenumber spectrum...
One fundamental difficulty in robotic learning is the sim-real gap problem. In this work, we propose to use segmentation as interface between perception and control, a domain-invariant state representation. We identify two sources of gap, one dynamics other visual gap. To close closed-loop control. For complex task with mask input, further learn model-free control policy deep neural network using imitation learning. model real environment simulated target plus background image, without any...
Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component recent deep is the target network which mitigates divergence when Q function. However, networks can slow down process due to delayed function updates. Our main contribution in this work self-regularized TD-learning method address without requiring network. Additionally, we propose self-guided policy improvement by combining...
3D shape models are becoming widely available and easier to capture, making information crucial for progress in object classification. Current state-of-the-art methods rely on CNNs address this problem. Recently, we witness two types of being developed: based upon volumetric representations versus multi-view representations. Empirical results from these exhibit a large gap, indicating that existing CNN architectures approaches unable fully exploit the power In paper, aim improve both...
Many previous works approach vision-based robotic grasping by training a value network that evaluates grasp proposals. These approaches require an optimization process at run-time to infer the best action from network. As result, inference time grows exponentially as dimension of space increases. We propose alternative method, directly neural density model approximate conditional distribution successful poses input images. construct combines Gaussian mixture and normalizing flows, which is...
We demonstrate model-based, visual robot manipulation of linear deformable objects. Our approach is based on a state-space representation the physical system that aims to control. This choice has multiple advantages, including ease incorporating physics priors in dynamics model and perception model, planning actions. In addition, states can naturally represent object instances different appearances. Therefore, state space be learned one setting directly used other visually settings. contrast...
In this paper, we approach the challenging problem of motion planning for knot tying. We propose a hierarchical in which top layer produces topological plan and bottom translates into continuous robot motion. The decomposes knotting task sequences abstract actions based on theory. each these trajectories through learned primitives. To adapt action to specific rope geometry, primitives take observed configuration as input. train by imitating human demonstrations reinforcement learning...
We describe a system for deep reinforcement learning of robotic manipulation skills applied to large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment RL policies requires not only effective training algorithms, but the ability bootstrap enable broad generalization. To this end, our combines scalable from data with bootstrapping simulation, incorporates auxiliary inputs existing computer vision systems as way boost generalization novel objects,...
Understanding dynamic 3D environment is crucial for robotic agents and many other applications. We propose a novel neural network architecture called $MeteorNet$ learning representations point cloud sequences. Different from previous work that adopts grid-based representation applies or 4D convolutions, our directly processes clouds. two ways to construct spatiotemporal neighborhoods each in the sequence. Information these aggregated learn features per point. benchmark on variety of...