NFDI4DS | UHH-SEMS - Publication Details

Xiaolong Wang

ORCID: 0000-0003-3150-778X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100424275

Research Areas

Human Pose and Action Recognition
Robot Manipulation and Learning
3D Shape Modeling and Analysis
Reinforcement Learning in Robotics
Advanced Vision and Imaging
Human Motion and Animation
Tactile and Sensory Interactions
Computer Graphics and Visualization Techniques
Hand Gesture Recognition Systems
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Image Processing Techniques and Applications
Robotic Locomotion and Control
Multimodal Machine Learning Applications
Generative Adversarial Networks and Image Synthesis
Advanced Image Processing Techniques
Teleoperation and Haptic Systems
Advanced Algorithms and Applications
Fluid Dynamics and Heat Transfer
Fault Detection and Control Systems
Virtual Reality Applications and Impacts
Interactive and Immersive Displays
Image Processing and 3D Reconstruction
Advanced Sensor and Control Systems
Astronomical Observations and Instrumentation

University of California, San Diego
2020-2024

National Synchrotron Radiation Laboratory
2024

University of Science and Technology of China
2024

Northwestern Polytechnical University
2024

UC San Diego Health System
2021-2023

Yanshan University
2023

China Aerodynamics Research and Development Center
2022

Jiangnan University
2021

Ocean University of China
2021

Shanghai Jiao Tong University
2020

Learning Continuous Image Representation with Local Implicit Image Function

OPENALEX - Publications

Yinbo Chen Sifei Liu Xiaolong Wang

How to represent an image? While the visual world is presented in a continuous manner, machines store and see images discrete way with 2D arrays of pixels. In this paper, we seek learn representation for images. Inspired by recent progress 3D reconstruction implicit neural representation, propose Local Implicit Image Function (LIIF), which takes image coordinate deep features around as inputs, predicts RGB value at given output. Since coordinates are continuous, LIIF can be arbitrary...

10.1109/cvpr46437.2021.00852 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

OPENALEX - Publications

Shaowei Liu Hanwen Jiang Jiarui Xu Sifei Liu Xiaolong Wang

Estimating 3D hand and object pose from a single image is an extremely challenging problem: hands objects are often self-occluded during interactions, the annotations scarce as even humans cannot directly label ground-truths perfectly. To tackle these challenges, we propose unified framework for estimating poses with semi-supervised learning. We build joint learning where perform explicit contextual reasoning between representations. Going beyond limited in image, leverage spatial-temporal...

10.1109/cvpr46437.2021.01445 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

OPENALEX - Publications

Hanwen Jiang Shaowei Liu Jiashun Wang Xiaolong Wang

While predicting robot grasps with parallel jaw grippers have been well studied and widely applied in manipulation tasks, the study on natural human grasp generation a multi-finger hand remains very challenging problem. In this paper, we propose to generate given 3D object world. Our key observation is that it crucial model consistency between contact points regions. That is, encourage prior be close surface common regions touched by at same time. Based hand-object consistency, design novel...

10.1109/iccv48922.2021.01092 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

OPENALEX - Publications

Jiashun Wang Huazhe Xu Jingwei Xu Sifei Liu Xiaolong Wang

Synthesizing 3D human motion plays an important role in many graphics applications as well understanding activity. While efforts have been made on generating realistic and natural motion, most approaches neglect the importance of modeling human-scene interactions affordance. On other hand, affordance reasoning (e.g., standing floor or sitting chair) has mainly studied with static pose gestures, it rarely addressed motion. In this paper, we propose to bridge synthesis scene reasoning. We...

10.1109/cvpr46437.2021.00928 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Generalization in Reinforcement Learning by Soft Data Augmentation

OPENALEX - Publications

Nicklas Hansen Xiaolong Wang

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation. However, as more factors variation are introduced during training, optimization becomes increasingly challenging, empirically may result in lower sample efficiency unstable training. Instead learning policies directly from augmented data, we propose SOft Data Augmentation (SODA), a method that decouples augmentation policy learning....

10.1109/icra48506.2021.9561103 article EN 2021-05-30

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

OPENALEX - Publications

Zeyuan Chen Yinbo Chen Jingwen Liu Xingqian Xu Vidit Goel and 3 more

Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since storage cost is expensive for videos of high fidelity, most them are stored in a relatively low resolution frame rate. Recent works Space-Time Video Super-Resolution (STVSR) developed to incorporate temporal interpolation spatial super-resolution unified framework. However, only support fixed up-sampling scale, which limits their flexibility applications. In this work, instead following...

10.1109/cvpr52688.2022.00209 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

OPENALEX - Publications

Jiteng Mu Weichao Qiu Adam Kortylewski Alan Yuille Nuno Vasconcelos and 1 more

Recent work has made significant progress on using implicit functions, as a continuous representation for 3D rigid object shape reconstruction. However, much less effort been devoted to modeling general articulated objects. Compared objects, objects have higher degrees of freedom, which makes it hard generalize unseen shapes. To deal with the large variance, we introduce Articulated Signed Distance Functions (A-SDF) represent shapes disentangled latent space, where separate codes encoding...

10.1109/iccv48922.2021.01276 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation From Single-Camera Teleoperation

OPENALEX - Publications

Yuzhe Qin Hao Su Xiaolong Wang

We propose to perform imitation learning for dexterous manipulation with multi-finger robot hand from human demonstrations, and transfer the policy real hand. introduce a novel single-camera teleoperation system collect 3D demonstrations efficiently only an iPad computer. One key contribution of our is that we construct customized each user in simulator, which manipulator resembling same structure operator's It provides intuitive interface avoid unstable human-robot retargeting data...

10.1109/lra.2022.3196104 article EN publisher-specific-oa IEEE Robotics and Automation Letters 2022-08-22

Rotating without Seeing: Towards In-hand Dexterity through Touch

OPENALEX - Publications

Zhao-Heng Yin Binghao Huang Yuzhe Qin Qifeng Chen Xiaolong Wang

real robot hand and rotate novel objects that are not presented in training.Extensive ablations

10.15607/rss.2023.xix.036 article EN 2023-07-10

GIFS: Neural Implicit Function for General Shape Representation

OPENALEX - Publications

Jianglong Ye Yuntao Chen Naiyan Wang Xiaolong Wang

Recent development of neural implicit function has shown tremendous success on high-quality 3D shape re-construction. However, most works divide the space into inside and outside shape, which limits their repre-senting power to single-layer watertight shapes. This limitation leads tedious data processing (converting non-watertight raw watertight) as well incapability representing general object shapes in real world. In this work, we propose a novel method represent including with multi-layer...

10.1109/cvpr52688.2022.01249 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Learning Continuous Grasping Function With a Dexterous Hand From Human Demonstrations

OPENALEX - Publications

Jianglong Ye Jiashun Wang Binghao Huang Yuzhe Qin Xiaolong Wang

We propose to learn generate grasping motion for manipulation with a dexterous hand using implicit functions. With continuous time inputs, the model can and smooth plan. name proposed Continuous Grasping Function (CGF). CGF is learned via generative modeling Conditional Variational Autoencoder 3D human demonstrations. will first convert large-scale human-object interaction trajectories robot demonstrations retargeting, then use these train CGF. During inference, we perform sampling different...

10.1109/lra.2023.3261745 article EN IEEE Robotics and Automation Letters 2023-03-24

Look Closer: Bridging Egocentric and Third-Person Views With Transformers for Robotic Manipulation

OPENALEX - Publications

Rishabh Jangir Nicklas Hansen Sambaran Ghosal Mohit Jain Xiaolong Wang

Learning to solve precision-based manipulation tasks from visual feedback using Reinforcement (RL) could drastically reduce the engineering efforts required by traditional robot systems. However, performing fine-grained motor control inputs alone is challenging, especially with a static third-person camera as often used in previous work. We propose setting for robotic which agent receives both and an egocentric mounted on robot's wrist. While static, enables actively its vision aid precise...

10.1109/lra.2022.3144512 article EN publisher-specific-oa IEEE Robotics and Automation Letters 2022-01-21

Semantic segmentation of large-scale point clouds by integrating attention mechanisms and transformer models

OPENALEX - Publications

Tiebiao Yuan Yangyang Yu Xiaolong Wang

10.1016/j.imavis.2024.105019 article EN Image and Vision Computing 2024-04-06

Computer multimedia art pattern and visual communication design integrating virtual reality technology and big data image processing

OPENALEX - Publications

Xiaolong Wang

10.1504/ijict.2025.144051 article EN International Journal of Information and Communication Technology 2025-01-01

Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation

OPENALEX - Publications

Yuelei Li Yan Ge Annabella Macaluso Mazeyu Ji Xueyan Zou and 1 more

The recent advancements in visual reasoning capabilities of large multimodal models (LMMs) and the semantic enrichment 3D feature fields have expanded horizons robotic capabilities. These developments hold significant potential for bridging gap between high-level from LMMs low-level control policies utilizing fields. In this work, we introduce LMM-3DP, a framework that can integrate LMM planners skill Policies. Our approach consists three key perspectives: planning, control, effective...

10.48550/arxiv.2501.18733 preprint EN arXiv (Cornell University) 2025-01-30

Temporal Difference Learning for Model Predictive Control

OPENALEX - Publications

Nicklas Hansen Xiaolong Wang Hao Su

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through learning, and better performance as computational budget planning increases. However, it is both costly to plan long horizons challenging obtain an accurate of the environment. In this work, we combine strengths model-based methods. We use learned task-oriented latent dynamics local trajectory optimization short horizon, terminal value function estimate...

10.48550/arxiv.2203.04955 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Visual Reinforcement Learning With Self-Supervised 3D Representations

OPENALEX - Publications

Yanjie Ze Nicklas Hansen Yinbo Chen Mohit Jain Xiaolong Wang

A prominent approach to visual Reinforcement Learning (RL) is learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal inductive biases. However, while real world inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for 3D representations motor...

10.1109/lra.2023.3259681 article EN IEEE Robotics and Automation Letters 2023-03-20

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects

OPENALEX - Publications

Chen Bao Helin Xu Yuzhe Qin Xiaolong Wang

To enable general-purpose robots, we will require the robot to operate daily articulated objects as humans do. Current manipulation has heavily relied on using a parallel gripper, which restricts limited set of objects. On other hand, operating with multi-finger hand allow better approximation human behavior and diverse this end, propose new benchmark called DexArt, involves Dexterous Articulated in physical simulator. In our benchmark, define multiple complex tasks, need manipulate within...

10.1109/cvpr52729.2023.02030 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

OPENALEX - Publications

Jianglong Ye Naiyan Wang Xiaolong Wang

Recent works on generalizable NeRFs have shown promising results novel view synthesis from single or few images. However, such models rarely been applied other downstream tasks beyond as semantic understanding and parsing. In this paper, we propose a framework named FeatureNeRF to learn by distilling pre-trained vision foundation (e.g., DINO, Latent Diffusion). leverages 2D 3D space via neural rendering, then extract deep features for query points NeRF MLPs. Consequently, it allows map...

10.1109/iccv51070.2023.00823 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

OPENALEX - Publications

Xuxin Cheng Jialong Li Shiqi Yang Ge Yang Xiaolong Wang

Teleoperation serves as a powerful method for collecting on-robot data essential robot learning from demonstrations. The intuitiveness and ease of use the teleoperation system are crucial ensuring high-quality, diverse, scalable data. To achieve this, we propose an immersive Open-TeleVision that allows operators to actively perceive robot's surroundings in stereoscopic manner. Additionally, mirrors operator's arm hand movements on robot, creating experience if mind is transmitted embodiment....

10.48550/arxiv.2407.01512 preprint EN arXiv (Cornell University) 2024-07-01

Online Adaptation for Consistent Mesh Reconstruction in the Wild

OPENALEX - Publications

Xueting Li Sifei Liu Shalini De Mello Kihwan Kim Xiaolong Wang and 2 more

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild. Without requiring annotations mesh, 2D keypoints, or camera pose for each video frame, we video-based reconstruction as a self-supervised online adaptation problem applied any incoming test video. We first learn category-specific model collection single-view images same category that jointly predicts shape, texture, and image. Then, at inference time, adapt...

10.48550/arxiv.2012.03196 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Coming Soon ...