Wenxiao Wang

ORCID: 0000-0002-6399-292X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Robotics and Sensor-Based Localization
  • Video Surveillance and Tracking Methods
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Indoor and Outdoor Localization Technologies
  • Adversarial Robustness in Machine Learning
  • Visual Attention and Saliency Detection

Zhejiang University
2022-2025

Shanghai Artificial Intelligence Laboratory
2022

University of Nottingham Ningbo China
2022

While features of different scales are perceptually important to visual inputs, existing vision transformers do not yet take advantage them explicitly. To this end, we first propose a cross-scale transformer, CrossFormer. It introduces embedding layer (CEL) and long-short distance attention (LSDA). On the one hand, CEL blends each token with multiple patches scales, providing self-attention module itself features. other LSDA splits into short-distance long-distance counterpart, which only...

10.1109/tpami.2023.3341806 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-19

The width of a neural network matters since increasing the will necessarily increase model capacity. However, performance does not improve linearly with and soon gets saturated. In this case, we argue that number networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely width. To prove it, one large is divided into several small ones regarding its parameters regularization components. Each these has fraction original one's parameters. We then train together make them...

10.1109/tip.2022.3201602 article EN IEEE Transactions on Image Processing 2022-01-01

10.1109/cvpr52733.2024.00979 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of imagery can lead depth ambiguity. Specifically, objects with different depths appear same bounding boxes similar visual features in 2D image. Unfortunately, network cannot accurately distinguish from such...

10.1109/tip.2023.3333225 article EN IEEE Transactions on Image Processing 2023-01-01

Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of imagery can lead depth ambiguity. Specifically, objects with different depths appear same bounding boxes similar visual features in 2D image. Unfortunately, network cannot accurately distinguish from such...

10.48550/arxiv.2212.10049 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...