Qi Song

ORCID: 0000-0002-5362-4103
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Robotics and Sensor-Based Localization
  • Advanced Vision and Imaging
  • 3D Shape Modeling and Analysis
  • Advanced Image and Video Retrieval Techniques
  • Industrial Vision Systems and Defect Detection
  • Multimodal Machine Learning Applications
  • Image and Object Detection Techniques
  • Video Analysis and Summarization
  • Music and Audio Processing
  • 3D Surveying and Cultural Heritage
  • Visual Attention and Saliency Detection

Chinese University of Hong Kong, Shenzhen
2022-2024

Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These usually form a similarity map of R^(CxC) (by compressing spatial dimensions) or R^(HWxHW) channels) describe the feature relations along either channel dimensions, where C is number channels, H and W are dimensions input map. However, such practices tend condense other hence causing attention missing, which might lead inferior results small/thin categories...

10.1609/aaai.v36i2.20126 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Semantic Scene Completion (SSC) aims to reconstruct complete 3D scenes with precise voxel-wise semantics from the single-view incomplete input data, a crucial but highly challenging problem for scene understanding. Although SSC has seen significant progress due introduction of 2D semantic priors in recent years, occluded parts, especially rear-view scenes, are still poorly completed and segmented. To ameliorate this issue, we propose novel deep learning framework SSC, named Planar...

10.1109/tmm.2023.3234441 article EN IEEE Transactions on Multimedia 2023-01-01

The non-local (NL) network has become a widely used technique for semantic segmentation, which computes an attention map to measure the relationships of each pixel pair. However, most current popular NL models tend ignore phenomenon that calculated appears be very noisy, containing interclass and intraclass inconsistencies, lowers accuracy reliability methods. In this article, we figuratively denote these inconsistencies as noises explore solutions denoise them. Specifically, inventively...

10.1109/tnnls.2022.3214216 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-01-17

3D perception tasks, such as object detection and Bird’s-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic scene layouts are crucial for this task, existing techniques often neglect synergistic effects of depth cues, leading to occurrence classification position estimation errors. Additionally, input-independent nature initial queries also limits learning capacity Transformer-based...

10.1109/tip.2024.3352808 article EN IEEE Transactions on Image Processing 2024-01-01

3D perception tasks, such as object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic scene layouts are crucial for this task, existing techniques often neglect synergistic effects of depth cues, leading to occurrence classification position estimation errors. Additionally, input-independent nature initial queries also limits learning capacity Transformer-based...

10.48550/arxiv.2408.06901 preprint EN arXiv (Cornell University) 2024-08-13

10.1109/tcsvt.2024.3499327 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-01-01
Coming Soon ...