NFDI4DS | UHH-SEMS - Publication Details

CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention

OPENALEX - Publications

Wenxiao Wang Wei Chen Qibo Qiu Long Chen Boxi Wu and 3 more

While features of different scales are perceptually important to visual inputs, existing vision transformers do not yet take advantage them explicitly. To this end, we first propose a cross-scale transformer, CrossFormer. It introduces embedding layer (CEL) and long-short distance attention (LSDA). On the one hand, CEL blends each token with multiple patches scales, providing self-attention module itself features. other LSDA splits into short-distance long-distance counterpart, which only...

10.1109/tpami.2023.3341806 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-19

Toward Better Accuracy-Efficiency Trade-Offs: Divide and Co-Training

OPENALEX - Publications

Shuai Zhao Liguang Zhou Wenxiao Wang Deng Cai Tin Lun Lam and 1 more

The width of a neural network matters since increasing the will necessarily increase model capacity. However, performance does not improve linearly with and soon gets saturated. In this case, we argue that number networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely width. To prove it, one large is divided into several small ones regarding its parameters regularization components. Each these has fraction original one's parameters. We then train together make them...

10.1109/tip.2022.3201602 article EN IEEE Transactions on Image Processing 2022-01-01

M3CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

OPENALEX - Publications

Qibo Qiu Honghui Yang Jiang Jian Shun Zhang Haochao Ying and 3 more

10.1109/tcsvt.2025.3553525 article EN IEEE Transactions on Circuits and Systems for Video Technology 2025-01-01

Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-Dataset 3D Object Detection

OPENALEX - Publications

Zhanwei Zhang Minghao Chen Shuai Xiao Liang Peng Hengjia Li and 5 more

10.1109/cvpr52733.2024.01448 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Learning Occupancy for Monocular 3D Object Detection

OPENALEX - Publications

Liang Peng Junkai Xu Haoran Cheng Yang Zheng Xiaopei Wu and 4 more

10.1109/cvpr52733.2024.00979 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

OPENALEX - Publications

Chenxi Huang Tong He Haidong Ren Wenxiao Wang Binbin Lin and 1 more

Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of imagery can lead depth ambiguity. Specifically, objects with different depths appear same bounding boxes similar visual features in 2D image. Unfortunately, network cannot accurately distinguish from such...

10.1109/tip.2023.3333225 article EN IEEE Transactions on Image Processing 2023-01-01

SelFLoc: Selective feature fusion for large-scale point cloud-based place recognition

OPENALEX - Publications

Qibo Qiu Wenxiao Wang Haochao Ying Dingkun Liang Haiming Gao and 1 more

10.1016/j.knosys.2024.111794 article EN Knowledge-Based Systems 2024-04-23

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

OPENALEX - Publications

Chenxi Huang Tong He Haidong Ren Wenxiao Wang Binbin Lin and 1 more

Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of imagery can lead depth ambiguity. Specifically, objects with different depths appear same bounding boxes similar visual features in 2D image. Unfortunately, network cannot accurately distinguish from such...

10.48550/arxiv.2212.10049 preprint EN other-oa arXiv (Cornell University) 2022-01-01