- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Robotics and Sensor-Based Localization
- Advanced Vision and Imaging
- 3D Shape Modeling and Analysis
- Advanced Image and Video Retrieval Techniques
- Industrial Vision Systems and Defect Detection
- Multimodal Machine Learning Applications
- Image and Object Detection Techniques
- Video Analysis and Summarization
- Music and Audio Processing
- 3D Surveying and Cultural Heritage
- Visual Attention and Saliency Detection
Chinese University of Hong Kong, Shenzhen
2022-2024
Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These usually form a similarity map of R^(CxC) (by compressing spatial dimensions) or R^(HWxHW) channels) describe the feature relations along either channel dimensions, where C is number channels, H and W are dimensions input map. However, such practices tend condense other hence causing attention missing, which might lead inferior results small/thin categories...
Semantic Scene Completion (SSC) aims to reconstruct complete 3D scenes with precise voxel-wise semantics from the single-view incomplete input data, a crucial but highly challenging problem for scene understanding. Although SSC has seen significant progress due introduction of 2D semantic priors in recent years, occluded parts, especially rear-view scenes, are still poorly completed and segmented. To ameliorate this issue, we propose novel deep learning framework SSC, named Planar...
The non-local (NL) network has become a widely used technique for semantic segmentation, which computes an attention map to measure the relationships of each pixel pair. However, most current popular NL models tend ignore phenomenon that calculated appears be very noisy, containing interclass and intraclass inconsistencies, lowers accuracy reliability methods. In this article, we figuratively denote these inconsistencies as noises explore solutions denoise them. Specifically, inventively...
3D perception tasks, such as object detection and Bird’s-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic scene layouts are crucial for this task, existing techniques often neglect synergistic effects of depth cues, leading to occurrence classification position estimation errors. Additionally, input-independent nature initial queries also limits learning capacity Transformer-based...
3D perception tasks, such as object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic scene layouts are crucial for this task, existing techniques often neglect synergistic effects of depth cues, leading to occurrence classification position estimation errors. Additionally, input-independent nature initial queries also limits learning capacity Transformer-based...