NFDI4DS | UHH-SEMS - Publication Details

ROSEFusion: Random Optimization for Online Dense Reconstruction under Fast Camera Motion

OPENALEX - Publications

Jiazhao Zhang Chenyang Zhu Lintao Zheng Kai Xu

Online reconstruction based on RGB-D sequences has thus far been restrained to relatively slow camera motions (<1m/s). Under very fast motion (e.g., 3m/s), the can easily crumble even for state-of-the-art methods. Fast brings two challenges depth fusion: 1) high nonlinearity of pose optimization due large inter-frame rotations and 2) lack reliably trackable features blur. We propose tackle difficulties fast-motion tracking in absence inertial measurements using random optimization,...

10.48550/arxiv.2105.05600 preprint EN cc-by arXiv (Cornell University) 2021-01-01

ASRO-DIO: Active Subspace Random Optimization Based Depth Inertial Odometry

OPENALEX - Publications

Jiazhao Zhang Yijie Tang He Wang Kai Xu

High-dimensional nonlinear state estimation is at the heart of inertial-aided navigation systems (INS). Traditional methods usually rely on good initialization and find difficulty in handling large interframe transformations due to fast camera motion. We opt tackle these challenges by solving depth inertial odometry (DIO) problem with random optimization. To address exponentially increased amount candidate states sampled for high-dimensional space, we propose a highly efficient variant...

10.1109/tro.2022.3208503 article EN IEEE Transactions on Robotics 2022-10-11

GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF

OPENALEX - Publications

Qiyu Dai Yan Zhu Yiran Geng Ciyu Ruan Jiazhao Zhang and 1 more

In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras sensing their geometry. We, first time, propose a multiview RGB-based network, GraspNeRF, that leverages generalizable neural radiance field (NeRF) achieve material-agnostic object grasping clutter. Compared existing NeRF-based 3-DoF methods rely on densely captured input images time-consuming...

10.1109/icra48891.2023.10160842 article EN 2023-05-29

Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

OPENALEX - Publications

Jiazhao Zhang Chenyang Zhu Lintao Zheng Kai Xu

Online semantic 3D segmentation in company with real-time RGB-D reconstruction poses special challenges such as how to perform convolution directly over the progressively fused geometric data, and smartly fuse information from frame frame. We propose a novel fusion-aware point which operates on surface being reconstructed exploits effectively inter-frame correlation for high-quality feature learning. This is enabled by dedicated dynamic data structure that organizes online acquired cloud...

10.1109/cvpr42600.2020.00459 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification

OPENALEX - Publications

Jiazhao Zhang Dai Liu Fanpeng Meng Qingnan Fan Xuelin Chen and 2 more

Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this happens 3D space, 3D-aware agent can advance its capability via learning from fine-grained spatial information. However, leveraging representation be prohibitively unpractical policy floor-level task, due to low sample efficiency and expensive computational cost. In work, we propose...

10.1109/cvpr52729.2023.00645 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

ROSEFusion

OPENALEX - Publications

Jiazhao Zhang Chenyang Zhu Lintao Zheng Kai Xu

Online reconstruction based on RGB-D sequences has thus far been restrained to relatively slow camera motions (<1m/s). Under very fast motion (e.g., 3m/s), the can easily crumble even for state-of-the-art methods. Fast brings two challenges depth fusion: 1) high nonlinearity of pose optimization due large inter-frame rotations and 2) lack reliably trackable features blur. We propose tackle difficulties fast-motion tracking in absence inertial measurements using random optimization,...

10.1145/3450626.3459676 article EN ACM Transactions on Graphics 2021-07-19

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

OPENALEX - Publications

Jiazhao Zhang Kunyu Wang Rongtao Xu Gengze Zhou Yicong Hong and 4 more

10.15607/rss.2024.xx.079 article EN 2024-07-15

Active Scene Understanding via Online Semantic Reconstruction

OPENALEX - Publications

Lintao Zheng Chenyang Zhu Jiazhao Zhang Hang Zhao Hui Huang and 2 more

Abstract We propose a novel approach to robot‐operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at recognition segmentation objects from scene. Our algorithm built top volumetric depth fusion framework performs real‐time voxel‐based labeling over reconstructed volume. The guided an estimated discrete viewing score field (VSF) parameterized 3D...

10.1111/cgf.13820 article EN Computer Graphics Forum 2019-10-01

MaskClustering: View Consensus Based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

OPENALEX - Publications

Mi Yan Jiazhao Zhang Yan Zhu He Wang

10.1109/cvpr52733.2024.02671 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

MIPS-Fusion: Multi-Implicit-Submaps for Scalable and Robust Online Neural RGB-D Reconstruction

OPENALEX - Publications

Yijie Tang Jiazhao Zhang Zhinan Yu He Wang Kai Xu

We introduce MIPS-Fusion, a robust and scalable online RGB-D reconstruction method based on novel neural implicit representation - multi-implicit-submap. Different from existing methods lacking either flexibility with single map or scalability due to extra storage of feature grids, we propose pure tackling both difficulties divide-and-conquer design. In our method, submaps are incrementally allocated alongside the scanning trajectory efficiently learned local bundle adjustments. The can be...

10.1145/3618363 article EN ACM Transactions on Graphics 2023-12-05

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

OPENALEX - Publications

Jiazhao Zhang Nandiraju Gireesh Jilong Wang Xiaomeng Fang Chaoyi Xu and 3 more

10.1109/icra57147.2024.10610125 article EN 2024-05-13

ROSEFusion

OPENALEX - Publications

Jiazhao Zhang Chenyang Zhu Lintao Zheng Kai Xu

10.1145/3476576.3476604 article ACM Transactions on Graphics 2021-07-17

Neural Observation Field Guided Hybrid Optimization of Camera Placement

OPENALEX - Publications

Yihan Cao Jiazhao Zhang Zhinan Yu Kai Xu

10.1109/lra.2024.3445634 article EN IEEE Robotics and Automation Letters 2024-08-19

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

OPENALEX - Publications

Jiazhao Zhang Kunyu Wang Shaoan Wang Minghan Li Haoran Liu and 4 more

A practical navigation agent must be capable of handling a wide range interaction demands, such as following instructions, searching objects, answering questions, tracking people, and more. Existing models for embodied fall short serving generalists in the real world, they are often constrained by specific task configurations or pre-defined maps with discretized waypoints. In this work, we present Uni-NaVid, first video-based vision-language-action (VLA) model designed to unify diverse tasks...

10.48550/arxiv.2412.06224 preprint EN arXiv (Cornell University) 2024-12-09

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs

OPENALEX - Publications

Yihan Cao Jiazhao Zhang Zhinan Yu Shuzhen Liu Zheng Qin and 3 more

Object goal navigation (ObjectNav) is a fundamental task of embodied AI that requires the agent to find target object in unseen environments. This particularly challenging as it demands both perceptual and cognitive processes for effective perception decision-making. While has gained significant progress powered by rapidly developed visual foundation models, on side remains limited either implicitly learning from massive demonstrations or explicitly leveraging pre-defined heuristic rules....

10.48550/arxiv.2412.10439 preprint EN arXiv (Cornell University) 2024-12-11

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

OPENALEX - Publications

Mi Yan Jiazhao Zhang Yan Zhu He Wang

Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment instances without predefined categories. However, progress in lags behind 2D counterpart due limited annotated data. To address this, recent works first generate open-vocabulary masks through models and then merge them into based on metrics calculated between two neighboring frames. In contrast these local metrics, we propose a novel metric, view consensus rate, enhance the utilization of multi-view...

10.48550/arxiv.2401.07745 preprint EN cc-by arXiv (Cornell University) 2024-01-01

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

OPENALEX - Publications

Jiazhao Zhang Kunyu Wang Rongtao Xu Gengze Zhou Yicong Hong and 4 more

Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions. In this field, generalization is long-standing challenge, either out-of-distribution scenes or from Sim Real. paper, we propose NaVid, video-based large vision language model (VLM), mitigate such gap. NaVid makes the first endeavour showcase capability VLMs achieve state-of-the-art level navigation performance...

10.48550/arxiv.2402.15852 preprint EN arXiv (Cornell University) 2024-02-24

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models

OPENALEX - Publications

Yu Yan Rongtao Xu Jiazhao Zhang Peiyang Li Xiaodan Liang and 1 more

Recent research on Vision-and-Language Navigation (VLN) indicates that agents suffer from poor generalization in unseen environments due to the lack of realistic training and high-quality path-instruction pairs. Most existing methods for constructing navigation scenes have high costs, extension instructions mainly relies predefined templates or rules, lacking adaptability. To alleviate issue, we propose InstruGen, a VLN pairs generation paradigm. Specifically, use YouTube house tour videos...

10.48550/arxiv.2411.11394 preprint EN arXiv (Cornell University) 2024-11-18

Coverage Path Planning for Unmanned Surface Vehicles in Unknown Environments Based on Reinforcement Learning

OPENALEX - Publications

Rui Song Zilong Song Yao Li Qixing Cheng Jiazhao Zhang and 1 more

10.1109/icus61736.2024.10839888 article EN 2021 IEEE International Conference on Unmanned Systems (ICUS) 2024-10-18

A Unified Gaussian Process for Branching and Nested Hyperparameter Optimization

OPENALEX - Publications

Jiazhao Zhang Ying Hung Chung-Ching Lin Zicheng Liu

Choosing appropriate hyperparameters plays a crucial role in the success of neural networks as hyper-parameters directly control behavior and performance training algorithms. To obtain efficient tuning, Bayesian optimization methods based on Gaussian process (GP) models are widely used. Despite numerous applications deep learning, existing methodologies developed convenient but restrictive assumption that tuning parameters independent each other. However, with conditional dependence common...

10.48550/arxiv.2402.04885 preprint EN arXiv (Cornell University) 2024-01-19

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

OPENALEX - Publications

Wenbo Cui Chengyang Zhao Songlin Wei Jiazhao Zhang Haoran Geng and 2 more

Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research 3D vision has primarily focused on manipulation through depth perception and pose detection. However, real-world environments, these methods often face challenges due to imperfect perception, such as with transparent lids reflective handles. Moreover, they generally lack the diversity part-based interactions required for flexible...

10.48550/arxiv.2411.18276 preprint EN arXiv (Cornell University) 2024-11-27

Neural Observation Field Guided Hybrid Optimization of Camera Placement

OPENALEX - Publications

Yihan Cao Jiazhao Zhang Zhinan Yu Kai Xu

Camera placement is crutial in multi-camera systems such as virtual reality, autonomous driving, and high-quality reconstruction. The camera challenge lies the nonlinear nature of high-dimensional parameters unavailability gradients for target functions like coverage visibility. Consequently, most existing methods tackle this by leveraging non-gradient-based optimization methods.In work, we present a hybrid approach that incorporates both gradient-based methods. This design allows our method...

10.48550/arxiv.2412.08266 preprint EN arXiv (Cornell University) 2024-12-11

Dynamic Locally Optimized Path Planning For Unmanned Surface Vehicle Under High Sea Conditions

OPENALEX - Publications

Jiazhao Zhang Rui Song Jian-Jun Wu Jianjian Liu

10.1109/ricai64321.2024.10911336 article EN 2024-12-06

Active Scene Understanding via Online Semantic Reconstruction

OPENALEX - Publications

Lintao Zheng Chenyang Zhu Jiazhao Zhang Hang Zhao Hui Huang and 2 more

We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at recognition segmentation objects from scene. Our algorithm built top volumetric depth fusion framework (e.g., KinectFusion) performs real-time voxel-based labeling over reconstructed volume. The guided an estimated discrete viewing score field (VSF)...

10.48550/arxiv.1906.07409 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

OPENALEX - Publications

Jiazhao Zhang Chenyang Zhu Lintao Zheng Kai Xu

Online semantic 3D segmentation in company with real-time RGB-D reconstruction poses special challenges such as how to perform convolution directly over the progressively fused geometric data, and smartly fuse information from frame frame. We propose a novel fusion-aware point which operates on surface being reconstructed exploits effectively inter-frame correlation for high quality feature learning. This is enabled by dedicated dynamic data structure organizes online acquired cloud...

10.48550/arxiv.2003.06233 preprint EN other-oa arXiv (Cornell University) 2020-01-01