Hang Xu

ORCID: 0000-0002-1067-8670
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Seismic Waves and Analysis
  • Topic Modeling
  • Geophysical and Geoelectrical Methods
  • Image and Object Detection Techniques
  • Visual Attention and Saliency Detection
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Video Surveillance and Tracking Methods
  • Earthquake Detection and Analysis
  • Autonomous Vehicle Technology and Safety
  • Generative Adversarial Networks and Image Synthesis
  • Optical measurement and interference techniques
  • Image Processing Techniques and Applications
  • Adversarial Robustness in Machine Learning
  • Text and Document Classification Technologies
  • Reinforcement Learning in Robotics
  • Computer Graphics and Visualization Techniques
  • Remote-Sensing Image Classification
  • Energy Efficient Wireless Sensor Networks
  • Advanced Measurement and Detection Methods
  • Digital Media Forensic Detection

Huawei Technologies (Sweden)
2022-2025

Huawei Technologies (Canada)
2022-2024

Hangzhou Dianzi University
2022-2024

Changchun Institute of Technology
2021

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed self-attention. However, employment of self-attention modules results in a quadratic complexity both computation and memory usage. Various attempts on approximating with linear been made Natural Language Processing. an in-depth analysis this work shows that they are either theoretically flawed or empirically ineffective recognition. We further identify...

10.48550/arxiv.2110.11945 preprint EN other-oa arXiv (Cornell University) 2021-01-01

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is long-standing research topic visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail incorporating relations of long-range points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired recent advances the transformer encoder-decoder various vision tasks, we move forwards...

10.1609/aaai.v36i1.19961 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

10.1109/cvpr52733.2024.01648 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Vision-language navigation (VLN) is a challenging task due to its large searching space in the environment. To address this problem, previous works have proposed some methods of fine-tuning model that pretrained on large-scale datasets. However, conventional require extra human-labeled data and lack self-exploration capabilities environments, which hinders their generalization unseen scenes. improve ability fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration...

10.18653/v1/2022.acl-long.332 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale dataset. Unlike Trans10K-v1 that only has two limited categories, our dataset several appealing benefits. (1) It 11 categories of objects, commonly occurring in human domestic environment, making it more practical for real-world application. (2) Trans10K-v2 brings challenges current advanced methods than its former version. Furthermore, novel...

10.48550/arxiv.2101.08461 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Existing text-guided image manipulation methods aim to modify the appearance of or edit a few objects in virtual simple scenario, which is far from practical application. In this work, we study novel task on entity level real world. The imposes three basic requirements, (1) consistent with text descriptions, (2) preserve text-irrelevant regions, and (3) merge manipulated into naturally. To end, propose new transformer-based framework based two-stage synthesis method, namely ManiTrans, can...

10.1109/cvpr52688.2022.01044 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

As one of the fundamental components object detection, intersection-over-union (IoU) calculations between two bounding boxes play an important role in samples selection, NMS operation and evaluation detection algorithms. This procedure is well-defined solved for planar images, while it challenging spherical ones. Some existing methods utilize to represent objects. However, they are biased due distortions Others use rectangles as unbiased representations, but adopt excessive approximate...

10.1609/aaai.v36i1.19929 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Self-supervised learning (SSL), especially contrastive methods, has raised attraction recently as it learns effective transferable representations without semantic annotations. A common practice for self-supervised pre-training is to use much data possible. For a specific downstream task, however, involving irrelevant in may degenerate the performance, observed from our extensive experiments. On other hand, existing SSL burdensome and infeasible different downstream-task-customized datasets...

10.1609/aaai.v36i2.20079 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Rotated object detection is a challenging task due to the difficulties of locating rotated objects and separating them effectively from background. For prediction, researchers have explored numerous regression-based classification-based approaches predict rotation angle. However, both paradigms are constrained by some flaws that make it difficult accurately angles, such as multi-solution boundary issues, which limits performance upper bound detectors. To address these we propose circular...

10.3390/electronics12153265 article EN Electronics 2023-07-29

By providing a 360° field of view, spherical panoramas can convey vivid visual impressions. Thus, they are widely used in virtual reality systems and street view services. However, due to bandwidth or storage limitations, existing only provide sparsely captured have limited interaction modes. Although there methods that synthesize novel views based on panoramas, the generated all lie lines connecting views. Therefore these do not support free-viewpoint navigation. In this paper, we propose...

10.3390/electronics12081954 article EN Electronics 2023-04-21

Abstract. In spread spectrum induced polarization (SSIP) data processing, attenuation of background noise from the observed is essential step that improves signal-to-noise ratio (SNR) SSIP data. The time-domain spectral based on pseudorandom sequence (TSIP) algorithm has been proposed to improve SNR these However, signal processing in still a challenging problem. We propose an enhanced correlation identification (ECI) attenuate noise. this algorithm, cross-correlation matching method helpful...

10.5194/npg-28-247-2021 article EN cc-by Nonlinear processes in geophysics 2021-05-19
Coming Soon ...