NFDI4DS | UHH-SEMS - Publication Details

Kaixuan Wang

ORCID: 0000-0001-9210-0233

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100619525

Research Areas

Advanced Vision and Imaging
Robotics and Sensor-Based Localization
Optical measurement and interference techniques
Image Processing Techniques and Applications
Advanced Image Processing Techniques
Advanced Image and Video Retrieval Techniques
Robotic Path Planning Algorithms
Image Enhancement Techniques
3D Surveying and Cultural Heritage
EEG and Brain-Computer Interfaces
Gait Recognition and Analysis
Hand Gesture Recognition Systems
Image Retrieval and Classification Techniques
Vehicle License Plate Recognition
Neural Networks and Applications
Currency Recognition and Detection
Distributed Control Multi-Agent Systems
Constructed Wetlands for Wastewater Treatment
Generative Adversarial Networks and Image Synthesis
Advanced Neural Network Applications
Video Surveillance and Tracking Methods
Wastewater Treatment and Nitrogen Removal
Advanced Computing and Algorithms
Image and Object Detection Techniques
Micro and Nano Robotics

Aviation Industry Corporation of China (China)
2024

East China University of Science and Technology
2024

University of Hong Kong
2018-2024

Hong Kong University of Science and Technology
2018-2024

First Affiliated Hospital of Dalian Medical University
2024

Dalian Medical University
2024

Harbin Institute of Technology
2024

University of St Andrews
2024

Anhui University of Technology
2024

Beihang University
2023

MVDepthNet: Real-Time Multiview Depth Estimation Neural Network

OPENALEX - Publications

Kaixuan Wang Shaojie Shen

Although deep neural networks have been widely applied to computer vision problems, extending them into multiview depth estimation is non-trivial. In this paper, we present MVDepthNet, a convolutional network solve the problem given several image-pose pairs from localized monocular camera in neighbor viewpoints. Multiview observations are encoded cost volume and then combined with reference image estimate map using an encoder-decoder network. By encoding information volume, our method...

10.1109/3dv.2018.00037 article EN 2021 International Conference on 3D Vision (3DV) 2018-09-01

An Efficient B-Spline-Based Kinodynamic Replanning Framework for Quadrotors

OPENALEX - Publications

Wenchao Ding Wenliang Gao Kaixuan Wang Shaojie Shen

Trajectory replanning for quadrotors is essential to enable fully autonomous flight in unknown environments. Hierarchical motion planning frameworks, which combine path with parameterization, are popular due their time efficiency. However, the cannot properly deal nonstatic initial states of quadrotor, may result nonsmooth or even dynamically infeasible trajectories. In this article, we present an efficient kinodynamic framework by exploiting advantageous properties B-spline, facilitates...

10.1109/tro.2019.2926390 article EN IEEE Transactions on Robotics 2019-08-23

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

OPENALEX - Publications

Wei Yin Chi Zhang Hao Chen Zhipeng Cai Gang Yu and 3 more

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation can only handle single camera model and unable perform mixed-data training due ambiguity. Meanwhile, SOTA trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover...

10.1109/iccv51070.2023.00830 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Real-time Scalable Dense Surfel Mapping

OPENALEX - Publications

Kaixuan Wang Fei Gao Shaojie Shen

In this paper, we propose a novel dense surfel mapping system that scales well in different environments with only CPU computation. Using sparse SLAM to estimate camera poses, the proposed can fuse intensity images and depth into globally consistent model. The is carefully designed so it build from room-scale urban-scale using RGB-D cameras, stereo cameras or even monocular camera. First, superpixels extracted both are used model surfels system. superpixel-based make our method runtime...

10.1109/icra.2019.8794101 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

OPENALEX - Publications

Rui Li Dong Gong Wei Yin Hao Chen Yu Zhu and 4 more

Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated areas, leading to corrupted estimations. Many multi-frame methods handle areas by identifying them with explicit masks and compensating cues monocular represented as local or features. The improvements are limited due uncontrolled quality of underutilized benefits fusion two types cues. In...

10.1109/cvpr52729.2023.02063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

SDCluster: A clustering based self-supervised pre-training method for semantic segmentation of remote sensing images

OPENALEX - Publications

Hanwen Xu Chenxiao Zhang Peng Yue Kaixuan Wang

10.1016/j.isprsjprs.2025.02.021 article EN ISPRS Journal of Photogrammetry and Remote Sensing 2025-03-07

Trajectory Replanning for Quadrotors Using Kinodynamic Search and Elastic Optimization

OPENALEX - Publications

Wenchao Ding Wenliang Gao Kaixuan Wang Shaojie Shen

We focus on a replanning scenario for quadrotors where considering time efficiency, non-static initial state and dynamical feasibility is of great significance. propose real-time B-spline based kinodynamic (RBK) search algorithm, which transforms position-only shortest path (such as A* Dijkstra) into an efficient search, by exploring the properties parameterization. The RBK greedy produces dynamically feasible time-parameterized trajectory efficiently, facilitates quadrotor. To cope with...

10.1109/icra.2018.8463188 preprint EN 2018-05-01

Autonomous aerial robot using dual‐fisheye cameras

OPENALEX - Publications

Wenliang Gao Kaixuan Wang Wenchao Ding Fei Gao Tong Qin and 1 more

Abstract Safety is undoubtedly the most fundamental requirement for any aerial robotic application. It essential to equip robots with omnidirectional perception coverage ensure safe navigation in complex environments. In this paper, we present a light‐weight and low‐cost system, which consists of two ultrawide field‐of‐view (FOV) fisheye cameras inertial measurement unit (IMU). The goal system achieve spherical sensing minimum sensor suite. are mounted rigidly facing upward downward...

10.1002/rob.21946 article EN Journal of Field Robotics 2020-02-25

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation

OPENALEX - Publications

Mu Hu Wei Yin Chi Zhang Zhipeng Cai Xiaoxiao Long and 5 more

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical accurate 3D recovery. Depth estimation, though complementary, present distinct challenges. State-of-the-art monocular methods achieve generalization through affine-invariant depths, but fail to recover real-world scale. Conversely, current techniques struggle with performance due insufficient labeled data. propose targeted solutions both...

10.1109/tpami.2024.3444912 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-08-16

Optimal Trajectory Generation for Quadrotor Teach-and-Repeat

OPENALEX - Publications

Fei Gao Luqi Wang Kaixuan Wang William Wu Boyu Zhou and 2 more

In this letter, we propose a novel motion planning framework for quadrotor teach-and-repeat applications. Instead of controlling the drone to precisely follow teaching path, our method converts an arbitrary jerky human-piloted trajectory topologically equivalent one, which is guaranteed be safe, smooth, and kinodynamically feasible with expected aggressiveness. Our proposed optimizes in both spatial temporal aspects. layer, flight corridor found represent free space that path. Then,...

10.1109/lra.2019.2895110 article EN IEEE Robotics and Automation Letters 2019-01-24

Thermal Defect Detection for Substation Equipment Based on Infrared Image Using Convolutional Neural Network

OPENALEX - Publications

Kaixuan Wang Jiaqiao Zhang Hongjun Ni Fuji Ren

Thermal defects of substation equipment have a great impact on the stability power systems. Temperature is crucial for thermal defect detection in infrared images. The traditional methods, which low efficiency and poor accuracy, record temperature images manually. In this study, method based using convolutional neural network (CNN) proposed. Firstly, improved pre-processing applied to reduce background information, region interest located according contour position hence improving quality...

10.3390/electronics10161986 article EN Electronics 2021-08-18

Quadtree-Accelerated Real-Time Monocular Dense Mapping

OPENALEX - Publications

Kaixuan Wang Wenchao Ding Shaojie Shen

In this paper, we propose a novel mapping method for robotic navigation. High-quality dense depth maps are estimated and fused into 3D reconstructions in real-time using single localized moving camera. The quadtree structure of the intensity image is used to reduce computation burden by estimating map multiple resolutions. Both quadtree-based pixel selection dynamic belief propagation proposed speed up process: pixels selected optimized with resource according their levels quadtree. Solved...

10.1109/iros.2018.8594101 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

Flow-Motion and Depth Network for Monocular Stereo and Beyond

OPENALEX - Publications

Kaixuan Wang Shaojie Shen

We propose a learning-based method <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images camera with known intrinsic calibration, our network estimates relative poses the map of source image. The core contribution proposed is threefold. First, tailored for static scenes jointly optical flow motion. By joint...

10.1109/lra.2020.2975750 article EN IEEE Robotics and Automation Letters 2020-02-21

Recognition of Rice Sheath Blight Based on a Backpropagation Neural Network

OPENALEX - Publications

Yi Lu Zhiyang Li Xiangqiang Zhao Shuaishuai Lv Xingxing Wang and 2 more

Rice sheath blight is one of the main diseases in rice production. The traditional detection method, which needs manual recognition, usually inefficient and slow. In this study, a recognition method for identifying based on backpropagation (BP) neural network posed. Firstly, sample image smoothed by median filtering histogram equalization, edge lesion segmented using Sobel operator, largely reduces background information significantly improves quality. Then, corresponding feature parameters...

10.3390/electronics10232907 article EN Electronics 2021-11-24

GIM: Learning Generalizable Image Matcher From Internet Videos

OPENALEX - Publications

Xuelun Shen Zhipeng Cai Wei Yin Matthias M. Müller Z.G. Li and 3 more

Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such typically need train separate models for different scene types and are impractical when the type unknown in advance. One of underlying problems limited scalability data construction pipelines, which limits diversity standard image datasets. To address this problem, we propose GIM, self-training...

10.48550/arxiv.2402.11095 preprint EN arXiv (Cornell University) 2024-02-16

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

OPENALEX - Publications

Mu Hu Wei Yin Chi Zhang Zhipeng Cai Xiaoxiao Long and 5 more

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from single image, which is crucial 3D recovery. While are geometrically related highly complimentary, they present distinct challenges. SoTA monocular methods achieve generalization by learning affine-invariant depths, cannot recover real-world metrics. Meanwhile, have limited performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions...

10.1109/tpami.2024.3444912 preprint EN arXiv (Cornell University) 2024-03-21

Effects of font size, stroke, and background on the legibility of Chinese characters in virtual reality for the elderly

OPENALEX - Publications

Yumiao Chen Gan Huang Kaixuan Wang

To investigate the legibility of Chinese characters' font size, text background opacity, and stroke for elderly in virtual reality, we recruited old young participants to conduct experiments with VR used eye-tracking technology record data task completion time error rate. After analysis, concluded that minimum recognition size is 30 dmm, best 60 which 20 40 dmm people. The style has a significant effect on people (p = 0.000*). Besides, sizes smaller than bigger 50 strokes over 50%...

10.1080/00140139.2024.2392798 article EN Ergonomics 2024-08-17

The Second Monocular Depth Estimation Challenge

OPENALEX - Publications

Jaime Spencer C. Stella Qian Michaela Trescakova Chris Russell Simon Hadfield and 38 more

This paper discusses the results for second edition of Monocular Depth Estimation Challenge (MDEC). was open to methods using any form supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge based around SYNS-Patches dataset, which features a wide diversity environments with high-quality dense ground-truth. includes complex natural environments, e.g. forests fields, are greatly underrepresented in current benchmarks.The received eight unique...

10.1109/cvprw59228.2023.00308 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Gesture recognition method based on improved YOLOv5 in complex background

OPENALEX - Publications

Mingming Chai Xiaobing Zhang Dongxia Cheng Wenchao Tong Kaixuan Wang

In view of the problems low accuracy and difficult recognition gesture detection algorithms in complex backgrounds. this paper, a method based on improved YOLOv5 backgrounds is studied. Firstly, to ensure that network focuses more effective channel features background images, SE attention mechanism introduced into both main neck network. Subsequently, without significantly increasing computational complexity, BiFPN module integrated better facilitate multi-scale feature fusion. Finally,...

10.1109/nnice61279.2024.10498303 article EN 2024-01-19

Building Smart City Drone for Graffiti Detection and Clean-up

OPENALEX - Publications

Shuqin Wang Jerry Gao LI Wei-yi Yanning Li Kaixuan Wang and 1 more

Graffiti on buildings and bridges are oftentimes an eyesore. Those road symbol signs can even pose safety risks to motorists. Not only is graffiti cleaning costly, it also disrupts normal traffic. a widespread problem in many cities the U.S. This paper proposes machine learning approach unmanned aerial vehicle (UAV) detection removal. Our solution builds smart city framework. The proposed expected lower cost minimize impact

10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00337 article EN 2019-08-01

Research on Information Recognition of VAT Invoice Based on Computer Vision

OPENALEX - Publications

Jiaqiao Zhang Fuji Ren Hongjun Ni Zhenya Zhang Kaixuan Wang

With the promotion of bill exchange system throughout world, use VAT invoices has exploded. In order to solve problems low efficiency, high error rate and labor intensity manual entry electronic invoice, a method recognizing invoice information based on computer vision was proposed. Firstly, image preprocessed, tilt correction implemented by local adaptive threshold Hough transform. Then key area segmented target object taken out projection method. Finally, characters were recognized OCR...

10.1109/ccis48116.2019.9073749 article EN 2019-12-01

Geometric Pretraining for Monocular Depth Estimation

OPENALEX - Publications

Kaixuan Wang Yao Chen Hengkai Guo Linfu Wen Shaojie Shen

ImageNet-pretrained networks have been widely used in transfer learning for monocular depth estimation. These pretrained are trained with classification losses which only semantic information is exploited while spatial ignored. However, both and important per-pixel In this paper, we design a novel self-supervised geometric pretraining task that tailored estimation using uncalibrated videos. The designed decouples the structure from input videos by simple yet effective conditional...

10.1109/icra40945.2020.9196847 article EN 2020-05-01

Probabilistic Dense Reconstruction from a Moving Camera

OPENALEX - Publications

Yonggen Ling Kaixuan Wang Shaojie Shen

This paper presents a probabilistic approach for online dense reconstruction using single monocular camera moving through the environment. Compared to spatial stereo, depth estimation from motion stereo is challenging due insufficient parallaxes, visual scale changes, pose errors, etc. We utilize both and temporal correlations of consecutive estimates increase robustness accuracy estimation. An online, recursive, scheme compute estimates, with corresponding covariances inlier probability...

10.1109/iros.2018.8593618 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

Coming Soon ...