Kaichen Zhou

ORCID: 0009-0008-7890-8736
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Robot Manipulation and Learning
  • Robotics and Sensor-Based Localization
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Prostate Cancer Treatment and Research
  • 3D Shape Modeling and Analysis
  • Computer Graphics and Visualization Techniques
  • Robotics and Automated Systems
  • Optical measurement and interference techniques
  • Remote Sensing and LiDAR Applications
  • Image Enhancement Techniques
  • AI-based Problem Solving and Planning
  • Prostate Cancer Diagnosis and Treatment
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Malware Detection Techniques
  • Image Retrieval and Classification Techniques
  • Teleoperation and Haptic Systems
  • Autonomous Vehicle Technology and Safety
  • Immunotherapy and Immune Responses
  • Image Processing Techniques and Applications
  • Optical Systems and Laser Technology
  • Digital Image Processing Techniques

Xuzhou Medical College
2024-2025

University of Oxford
2022-2024

Shenzhen Institutes of Advanced Technology
2022-2024

University of Chinese Academy of Sciences
2024

Chinese Academy of Sciences
2022

Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....

10.1109/lra.2024.3396101 article EN IEEE Robotics and Automation Letters 2024-05-02

This study aimed to explore the factors affecting pathologic complete response (pCR) and prognosis of locally advanced prostate cancer (LAPCa) or metastatic (mPCa) treated with neoadjuvant androgen deprivation therapy (ADT) abiraterone acetate (AA). retrospective enrolled patients diagnosed LAPCa mPCa who were divided into three groups based on prostate-specific antigen (PSA) nadir following ADT AA: group 1 (PSA ≤ 0.2 ng/ml), 2 0.2-4.0 3 > 4.0 ng/ml). Univariate multivariate logistic...

10.1186/s40001-025-02521-7 article EN cc-by-nc-nd European journal of medical research 2025-04-04

A fundamental objective in robot manipulation is to enable models comprehend visual scenes and execute actions. Although existing Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges two areas: 1) inadequate reasoning ability tackle complex 2) high computational costs for MLLM fine-tuning inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities non-trivial sequence modeling with linear...

10.48550/arxiv.2406.04339 preprint EN arXiv (Cornell University) 2024-06-06

Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruction capabilities with dense view images. However, its performance significantly deteriorates under sparse settings. We observe that learning the consistency of pixels among different views is crucial for improving quality in such cases. In this paper, we propose ConsistentNeRF, a method leverages depth information to regularize both multi-view and single-view pixels. Specifically, ConsistentNeRF employs depth-derived...

10.48550/arxiv.2305.11031 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Location information is pivotal for the automation and intelligence of terminal devices edge-cloud IoT systems, such as autonomous vehicles augmented reality. However, achieving reliable positioning across diverse applications remains challenging due to significant training costs necessity densely collected data. To tackle these issues, we have innovatively applied selective state space (SSM) model visual localization, introducing a new named MambaLoc. The proposed demonstrates exceptional...

10.48550/arxiv.2408.09680 preprint EN arXiv (Cornell University) 2024-08-18

Despite advancements in self-supervised monocular depth estimation, challenges persist dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present MGDepth, Motion-Guided Cost Volume Depth Net, achieve precise estimation for both objects and backgrounds, all while maintaining computational efficiency. To tackle posed by content, incorporate optical flow coarse create novel reference frame. This frame is then utilized build motion-guided cost volume...

10.48550/arxiv.2312.15268 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Although significant progress has been made in the field of 2D-based interactive editing, fine-grained 3D-based editing remains relatively unexplored. This limitation can be attributed to two main challenges: lack an efficient 3D representation robust different modifications and absence effective segmentation method. In this paper, we introduce a novel algorithm with radiance fields, which refer as SERF. Our method entails creating neural mesh by integrating multi-view algorithms pre-trained...

10.48550/arxiv.2312.15856 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics 3D scenes. Enhancing capabilities NeRFs in open-vocabulary semantic perception tasks been recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) field learning encounter difficulties due to noisy view-inconsistent by CLIP. To tackle these limitations, we propose OV-NeRF, which...

10.48550/arxiv.2402.04648 preprint EN arXiv (Cornell University) 2024-02-07

Despite the advancements in deep learning for camera relocalization tasks, obtaining ground truth pose labels required training process remains a costly endeavor. While current weakly supervised methods excel lightweight label generation, their performance notably declines scenarios with sparse views. In response to this challenge, we introduce WSCLoc, system capable of being customized various learning-based models enhance under weakly-supervised and view conditions. This is realized two...

10.48550/arxiv.2403.15272 preprint EN arXiv (Cornell University) 2024-03-22

Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) promise in visual instruction following demonstrated strong reasoning abilities various tasks. To unleash general MLLMs as an end-to-end agent, we introduce Self-Corrected (SC)-MLLM, equipping our model...

10.48550/arxiv.2405.17418 preprint EN arXiv (Cornell University) 2024-05-27

Neural Radiation Field (NeRF) is driving the development of 3D reconstruction technology. Several NeRF variants have been proposed to improve rendering accuracy and speed. One most significant variants, TensoRF, uses a 4D tensor model radiation field, resulting in improved However, quality remains limited. This study presents an TensoRF that addresses aforementioned issues by reconstructing its multilayer perceptron network. Increasing number neurons input network layers improves render...

10.1117/12.3031943 article EN 2024-06-06

This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar into reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse volume features extracted multiple...

10.48550/arxiv.2406.04301 preprint EN arXiv (Cornell University) 2024-06-06

The ability to reflect on and correct failures is crucial for robotic systems interact stably with real-life objects.Observing the generalization reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed utilize these models enhance accordingly.However, methods typically focus high-level planning corrections using an additional MLLM, limited utilization failed samples low-level contact poses. To address this gap, we propose Autonomous Interactive...

10.48550/arxiv.2406.11548 preprint EN arXiv (Cornell University) 2024-06-17

Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance robotic grasping. However, previous grasping models face challenges cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make following contributions: 1) are study occlusion level of 2) set up an evaluation consisting large-scale synthetic data and...

10.48550/arxiv.2407.06168 preprint EN arXiv (Cornell University) 2024-07-08

Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized environmental noise data corruption such as fluctuating lighting conditions, variable object poses, unstable camera positions. To address this gap, we introduce Realistic Anomaly Detection (RAD)...

10.48550/arxiv.2410.00713 preprint EN arXiv (Cornell University) 2024-10-01

10.1109/iros58592.2024.10801669 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024-10-14

Deep learning has led to great progress in the detection of mobile (i.e. movement-capable) objects urban driving scenes recent years. Supervised approaches typically require annotation large training sets; there thus been interest leveraging weakly, semi- or self-supervised methods avoid this, with much success. Whilst weakly and semi-supervised some annotation, have used cues such as motion relieve need for altogether. However, a complete absence degrades their performance, ambiguities that...

10.48550/arxiv.2209.10471 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Human-robot interaction (HRI) research using vision for robot teleoperation is closely related to the development of field artificial intelligence. There are many methods imitating human postures by humanoid robots. However, it not easy fully parameterize movements collaborative robots because difference in morphology between humans and This paper proposes a human-robot method based on digital twin technology try solve this problem. Using method, operator can control at distance accomplish...

10.1109/tocs56154.2022.10016008 article EN 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS) 2022-12-11

Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) deal with degraded or missing input are less well studied. In particular, we note that previous towards deep fusion do not perform significantly better than models employing a single modality. We conjecture this is because naive feature space through summation concatenation which take into account different...

10.48550/arxiv.2003.07289 preprint EN other-oa arXiv (Cornell University) 2020-01-01

A truly generalizable approach to rigid segmentation and motion estimation is fundamental 3D understanding of articulated objects moving scenes. In view the closely intertwined relationship between estimates, we present an SE(3) equivariant architecture a training strategy tackle this task in unsupervised manner. Our composed two interconnected, lightweight heads. These heads predict masks using point-level invariant features estimate from features, all without need for category information....

10.48550/arxiv.2306.05584 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....

10.48550/arxiv.2311.16592 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit 3D point-wise geometric correspondences, nor effectively tackle ambiguities in warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised framework, that can consider spatial information,...

10.48550/arxiv.2209.06351 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...