NFDI4DS | UHH-SEMS - Publication Details

RGBGrasp: Image-Based Object Grasping by Capturing Multiple Views During Robot arm Movement With Neural Radiance Fields

OPENALEX - Publications

Chang Liu Kejian Shi Kaichen Zhou Haoxiao Wang Jiyao Zhang and 1 more

Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....

10.1109/lra.2024.3396101 article EN IEEE Robotics and Automation Letters 2024-05-02

Pathologic response as an early endpoint for survival following neoadjuvant androgen deprivation therapy plus abiraterone acetate for metastatic prostate cancer

OPENALEX - Publications

Kaichen Zhou Hui‐qin Lu Fukun Wei Jie Wang Zhen Song and 1 more

This study aimed to explore the factors affecting pathologic complete response (pCR) and prognosis of locally advanced prostate cancer (LAPCa) or metastatic (mPCa) treated with neoadjuvant androgen deprivation therapy (ADT) abiraterone acetate (AA). retrospective enrolled patients diagnosed LAPCa mPCa who were divided into three groups based on prostate-specific antigen (PSA) nadir following ADT AA: group 1 (PSA ≤ 0.2 ng/ml), 2 0.2-4.0 3 > 4.0 ng/ml). Univariate multivariate logistic...

10.1186/s40001-025-02521-7 article EN cc-by-nc-nd European journal of medical research 2025-04-04

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

OPENALEX - Publications

Jiaming Liu Mengzhen Liu Zhenyu Wang Lily Lee Kaichen Zhou and 5 more

A fundamental objective in robot manipulation is to enable models comprehend visual scenes and execute actions. Although existing Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges two areas: 1) inadequate reasoning ability tackle complex 2) high computational costs for MLLM fine-tuning inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities non-trivial sequence modeling with linear...

10.48550/arxiv.2406.04339 preprint EN arXiv (Cornell University) 2024-06-06

ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

OPENALEX - Publications

Shoukang Hu Kaichen Zhou Kaiyu Li Longhui Yu Lanqing Hong and 4 more

Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruction capabilities with dense view images. However, its performance significantly deteriorates under sparse settings. We observe that learning the consistency of pixels among different views is crucial for improving quality in such cases. In this paper, we propose ConsistentNeRF, a method leverages depth information to regularize both multi-view and single-view pixels. Specifically, ConsistentNeRF employs depth-derived...

10.48550/arxiv.2305.11031 preprint EN other-oa arXiv (Cornell University) 2023-01-01

MambaLoc: Efficient Camera Localisation via State Space Model

OPENALEX - Publications

Jialu Wang Kaichen Zhou Andrew Markham Niki Trigoni

Location information is pivotal for the automation and intelligence of terminal devices edge-cloud IoT systems, such as autonomous vehicles augmented reality. However, achieving reliable positioning across diverse applications remains challenging due to significant training costs necessity densely collected data. To tackle these issues, we have innovatively applied selective state space (SSM) model visual localization, introducing a new named MambaLoc. The proposed demonstrates exceptional...

10.48550/arxiv.2408.09680 preprint EN arXiv (Cornell University) 2024-08-18

MGDepth: Motion-Guided Cost Volume For Self-Supervised Monocular Depth In Dynamic Scenarios

OPENALEX - Publications

Kaichen Zhou Jia-Xing Zhong Jia-Wang Bian Qian Xie Jian-Qing Zheng and 2 more

Despite advancements in self-supervised monocular depth estimation, challenges persist dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present MGDepth, Motion-Guided Cost Volume Depth Net, achieve precise estimation for both objects and backgrounds, all while maintaining computational efficiency. To tackle posed by content, incorporate optical flow coarse create novel reference frame. This frame is then utilized build motion-guided cost volume...

10.48550/arxiv.2312.15268 preprint EN other-oa arXiv (Cornell University) 2023-01-01

SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields

OPENALEX - Publications

Kaichen Zhou Lanqing Hong Enze Xie Yongxin Yang Zhenguo Li and 1 more

Although significant progress has been made in the field of 2D-based interactive editing, fine-grained 3D-based editing remains relatively unexplored. This limitation can be attributed to two main challenges: lack an efficient 3D representation robust different modifications and absence effective segmentation method. In this paper, we introduce a novel algorithm with radiance fields, which refer as SERF. Our method entails creating neural mesh by integrating multi-view algorithms pre-trained...

10.48550/arxiv.2312.15856 preprint EN other-oa arXiv (Cornell University) 2023-01-01

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

OPENALEX - Publications

Guibiao Liao Kaichen Zhou Zhenyu Bao Kanglin Liu Qing Li

The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics 3D scenes. Enhancing capabilities NeRFs in open-vocabulary semantic perception tasks been recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) field learning encounter difficulties due to noisy view-inconsistent by CLIP. To tackle these limitations, we propose OV-NeRF, which...

10.48550/arxiv.2402.04648 preprint EN arXiv (Cornell University) 2024-02-07

WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization

OPENALEX - Publications

Jialu Wang Kaichen Zhou Andrew Markham Niki Trigoni

Despite the advancements in deep learning for camera relocalization tasks, obtaining ground truth pose labels required training process remains a costly endeavor. While current weakly supervised methods excel lightweight label generation, their performance notably declines scenarios with sparse views. In response to this challenge, we introduce WSCLoc, system capable of being customized various learning-based models enhance under weakly-supervised and view conditions. This is realized two...

10.48550/arxiv.2403.15272 preprint EN arXiv (Cornell University) 2024-03-22

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

OPENALEX - Publications

Jiaming Liu Chenxuan Li Guanqun Wang Lily Lee Kaichen Zhou and 5 more

Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) promise in visual instruction following demonstrated strong reasoning abilities various tasks. To unleash general MLLMs as an end-to-end agent, we introduce Self-Corrected (SC)-MLLM, equipping our model...

10.48550/arxiv.2405.17418 preprint EN arXiv (Cornell University) 2024-05-27

3D reconstruction in industrial environments based on an improved neural radiation field method research

OPENALEX - Publications

Kaichen Zhou Tianlun Huang Weijun Wang Haowen Luo khan Fawad and 2 more

Neural Radiation Field (NeRF) is driving the development of 3D reconstruction technology. Several NeRF variants have been proposed to improve rendering accuracy and speed. One most significant variants, TensoRF, uses a 4D tensor model radiation field, resulting in improved However, quality remains limited. This study presents an TensoRF that addresses aforementioned issues by reconstructing its multilayer perceptron network. Increasing number neurons input network layers improves render...

10.1117/12.3031943 article EN 2024-06-06

Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry

OPENALEX - Publications

Kaichen Zhou

This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar into reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse volume features extracted multiple...

10.48550/arxiv.2406.04301 preprint EN arXiv (Cornell University) 2024-06-06

AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

OPENALEX - Publications

Chuyan Xiong Chengyu Shen Xiaoqi Li Kaichen Zhou Jiaming Liu and 2 more

The ability to reflect on and correct failures is crucial for robotic systems interact stably with real-life objects.Observing the generalization reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed utilize these models enhance accordingly.However, methods typically focus high-level planning corrections using an additional MLLM, limited utilization failed samples low-level contact poses. To address this gap, we propose Autonomous Interactive...

10.48550/arxiv.2406.11548 preprint EN arXiv (Cornell University) 2024-06-17

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

OPENALEX - Publications

Yan Xia Ran Ding Ziyuan Qin Guanqi Zhan Kaichen Zhou and 3 more

Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance robotic grasping. However, previous grasping models face challenges cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make following contributions: 1) are study occlusion level of 2) set up an evaluation consisting large-scale synthetic data and...

10.48550/arxiv.2407.06168 preprint EN arXiv (Cornell University) 2024-07-08

RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations

OPENALEX - Publications

Kaichen Zhou Yang Cao Tae-Yeon Kim Hao Zhao Hao Dong and 2 more

Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized environmental noise data corruption such as fluctuating lighting conditions, variable object poses, unstable camera positions. To address this gap, we introduce Realistic Anomaly Detection (RAD)...

10.48550/arxiv.2410.00713 preprint EN arXiv (Cornell University) 2024-10-01

WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization via Radiance Field

OPENALEX - Publications

Jialu Wang Kaichen Zhou Andrew Markham Niki Trigoni

10.1109/iros58592.2024.10801669 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024-10-14

Sample, Crop, Track: Self-Supervised Mobile 3D Object Detection for Urban Driving LiDAR

OPENALEX - Publications

Sangyun Shin Stuart Golodetz Madhu Vankadari Kaichen Zhou Andrew Markham and 1 more

Deep learning has led to great progress in the detection of mobile (i.e. movement-capable) objects urban driving scenes recent years. Supervised approaches typically require annotation large training sets; there thus been interest leveraging weakly, semi- or self-supervised methods avoid this, with much success. Whilst weakly and semi-supervised some annotation, have used cues such as motion relieve need for altogether. However, a complete absence degrades their performance, ambiguities that...

10.48550/arxiv.2209.10471 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Design of a human-robot interaction system for robot teleoperation based on digital twinning

OPENALEX - Publications

Ruishuo Xu Weijun Wang Wei Feng Zhaokun Zhou Boyoung An and 2 more

Human-robot interaction (HRI) research using vision for robot teleoperation is closely related to the development of field artificial intelligence. There are many methods imitating human postures by humanoid robots. However, it not easy fully parameterize movements collaborative robots because difference in morphology between humans and This paper proposes a human-robot method based on digital twin technology try solve this problem. Using method, operator can control at distance accomplish...

10.1109/tocs56154.2022.10016008 article EN 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS) 2022-12-11

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization

OPENALEX - Publications

Kaichen Zhou Changhao Chen Bing Wang Muhamad Risqi U. Saputra Niki Trigoni and 1 more

Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) deal with degraded or missing input are less well studied. In particular, we note that previous towards deep fusion do not perform significantly better than models employing a single modality. We conjecture this is because naive feature space through summation concatenation which take into account different...

10.48550/arxiv.2003.07289 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

OPENALEX - Publications

Jia-Xing Zhong Ta-Ying Cheng Yuhang He Kai Lü Kaichen Zhou and 2 more

A truly generalizable approach to rigid segmentation and motion estimation is fundamental 3D understanding of articulated objects moving scenes. In view the closely intertwined relationship between estimates, we present an SE(3) equivariant architecture a training strategy tackle this task in unsupervised manner. Our composed two interconnected, lightweight heads. These heads predict masks using point-level invariant features estimate from features, all without need for category information....

10.48550/arxiv.2306.05584 preprint EN other-oa arXiv (Cornell University) 2023-01-01

RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields

OPENALEX - Publications

Chang Liu Kejian Shi Kaichen Zhou Haoxiao Wang Jiyao Zhang and 1 more

Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....

10.48550/arxiv.2311.16592 preprint EN cc-by arXiv (Cornell University) 2023-01-01

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

OPENALEX - Publications

Kaichen Zhou Lanqing Hong Changhao Chen Hang Xu Chaoqiang Ye and 2 more

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit 3D point-wise geometric correspondences, nor effectively tackle ambiguities in warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised framework, that can consider spatial information,...

10.48550/arxiv.2209.06351 preprint EN other-oa arXiv (Cornell University) 2022-01-01