- Advanced Vision and Imaging
- Robot Manipulation and Learning
- Robotics and Sensor-Based Localization
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Prostate Cancer Treatment and Research
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Robotics and Automated Systems
- Optical measurement and interference techniques
- Remote Sensing and LiDAR Applications
- Image Enhancement Techniques
- AI-based Problem Solving and Planning
- Prostate Cancer Diagnosis and Treatment
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Malware Detection Techniques
- Image Retrieval and Classification Techniques
- Teleoperation and Haptic Systems
- Autonomous Vehicle Technology and Safety
- Immunotherapy and Immune Responses
- Image Processing Techniques and Applications
- Optical Systems and Laser Technology
- Digital Image Processing Techniques
Xuzhou Medical College
2024-2025
University of Oxford
2022-2024
Shenzhen Institutes of Advanced Technology
2022-2024
University of Chinese Academy of Sciences
2024
Chinese Academy of Sciences
2022
Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....
This study aimed to explore the factors affecting pathologic complete response (pCR) and prognosis of locally advanced prostate cancer (LAPCa) or metastatic (mPCa) treated with neoadjuvant androgen deprivation therapy (ADT) abiraterone acetate (AA). retrospective enrolled patients diagnosed LAPCa mPCa who were divided into three groups based on prostate-specific antigen (PSA) nadir following ADT AA: group 1 (PSA ≤ 0.2 ng/ml), 2 0.2-4.0 3 > 4.0 ng/ml). Univariate multivariate logistic...
A fundamental objective in robot manipulation is to enable models comprehend visual scenes and execute actions. Although existing Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges two areas: 1) inadequate reasoning ability tackle complex 2) high computational costs for MLLM fine-tuning inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities non-trivial sequence modeling with linear...
Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruction capabilities with dense view images. However, its performance significantly deteriorates under sparse settings. We observe that learning the consistency of pixels among different views is crucial for improving quality in such cases. In this paper, we propose ConsistentNeRF, a method leverages depth information to regularize both multi-view and single-view pixels. Specifically, ConsistentNeRF employs depth-derived...
Location information is pivotal for the automation and intelligence of terminal devices edge-cloud IoT systems, such as autonomous vehicles augmented reality. However, achieving reliable positioning across diverse applications remains challenging due to significant training costs necessity densely collected data. To tackle these issues, we have innovatively applied selective state space (SSM) model visual localization, introducing a new named MambaLoc. The proposed demonstrates exceptional...
Despite advancements in self-supervised monocular depth estimation, challenges persist dynamic scenarios due to the dependence on assumptions about a static world. In this paper, we present MGDepth, Motion-Guided Cost Volume Depth Net, achieve precise estimation for both objects and backgrounds, all while maintaining computational efficiency. To tackle posed by content, incorporate optical flow coarse create novel reference frame. This frame is then utilized build motion-guided cost volume...
Although significant progress has been made in the field of 2D-based interactive editing, fine-grained 3D-based editing remains relatively unexplored. This limitation can be attributed to two main challenges: lack an efficient 3D representation robust different modifications and absence effective segmentation method. In this paper, we introduce a novel algorithm with radiance fields, which refer as SERF. Our method entails creating neural mesh by integrating multi-view algorithms pre-trained...
The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics 3D scenes. Enhancing capabilities NeRFs in open-vocabulary semantic perception tasks been recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) field learning encounter difficulties due to noisy view-inconsistent by CLIP. To tackle these limitations, we propose OV-NeRF, which...
Despite the advancements in deep learning for camera relocalization tasks, obtaining ground truth pose labels required training process remains a costly endeavor. While current weakly supervised methods excel lightweight label generation, their performance notably declines scenarios with sparse views. In response to this challenge, we introduce WSCLoc, system capable of being customized various learning-based models enhance under weakly-supervised and view conditions. This is realized two...
Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) promise in visual instruction following demonstrated strong reasoning abilities various tasks. To unleash general MLLMs as an end-to-end agent, we introduce Self-Corrected (SC)-MLLM, equipping our model...
Neural Radiation Field (NeRF) is driving the development of 3D reconstruction technology. Several NeRF variants have been proposed to improve rendering accuracy and speed. One most significant variants, TensoRF, uses a 4D tensor model radiation field, resulting in improved However, quality remains limited. This study presents an TensoRF that addresses aforementioned issues by reconstructing its multilayer perceptron network. Increasing number neurons input network layers improves render...
This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar into reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse volume features extracted multiple...
The ability to reflect on and correct failures is crucial for robotic systems interact stably with real-life objects.Observing the generalization reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed utilize these models enhance accordingly.However, methods typically focus high-level planning corrections using an additional MLLM, limited utilization failed samples low-level contact poses. To address this gap, we propose Autonomous Interactive...
Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance robotic grasping. However, previous grasping models face challenges cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make following contributions: 1) are study occlusion level of 2) set up an evaluation consisting large-scale synthetic data and...
Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized environmental noise data corruption such as fluctuating lighting conditions, variable object poses, unstable camera positions. To address this gap, we introduce Realistic Anomaly Detection (RAD)...
Deep learning has led to great progress in the detection of mobile (i.e. movement-capable) objects urban driving scenes recent years. Supervised approaches typically require annotation large training sets; there thus been interest leveraging weakly, semi- or self-supervised methods avoid this, with much success. Whilst weakly and semi-supervised some annotation, have used cues such as motion relieve need for altogether. However, a complete absence degrades their performance, ambiguities that...
Human-robot interaction (HRI) research using vision for robot teleoperation is closely related to the development of field artificial intelligence. There are many methods imitating human postures by humanoid robots. However, it not easy fully parameterize movements collaborative robots because difference in morphology between humans and This paper proposes a human-robot method based on digital twin technology try solve this problem. Using method, operator can control at distance accomplish...
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) deal with degraded or missing input are less well studied. In particular, we note that previous towards deep fusion do not perform significantly better than models employing a single modality. We conjecture this is because naive feature space through summation concatenation which take into account different...
A truly generalizable approach to rigid segmentation and motion estimation is fundamental 3D understanding of articulated objects moving scenes. In view the closely intertwined relationship between estimates, we present an SE(3) equivariant architecture a training strategy tackle this task in unsupervised manner. Our composed two interconnected, lightweight heads. These heads predict masks using point-level invariant features estimate from features, all without need for category information....
Robotic research encounters a significant hurdle when it comes to the intricate task of grasping objects that come in various shapes, materials, and textures. Unlike many prior investigations heavily leaned on specialized point-cloud cameras or abundant RGB visual data gather 3D insights for object-grasping missions, this paper introduces pioneering approach called RGBGrasp. This method depends limited set views perceive surroundings containing transparent specular achieve accurate grasping....
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit 3D point-wise geometric correspondences, nor effectively tackle ambiguities in warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised framework, that can consider spatial information,...