Kaixuan Wang

ORCID: 0000-0001-9210-0233
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Optical measurement and interference techniques
  • Image Processing Techniques and Applications
  • Advanced Image Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Robotic Path Planning Algorithms
  • Image Enhancement Techniques
  • 3D Surveying and Cultural Heritage
  • EEG and Brain-Computer Interfaces
  • Gait Recognition and Analysis
  • Hand Gesture Recognition Systems
  • Image Retrieval and Classification Techniques
  • Vehicle License Plate Recognition
  • Neural Networks and Applications
  • Currency Recognition and Detection
  • Distributed Control Multi-Agent Systems
  • Constructed Wetlands for Wastewater Treatment
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Wastewater Treatment and Nitrogen Removal
  • Advanced Computing and Algorithms
  • Image and Object Detection Techniques
  • Micro and Nano Robotics

Aviation Industry Corporation of China (China)
2024

East China University of Science and Technology
2024

University of Hong Kong
2018-2024

Hong Kong University of Science and Technology
2018-2024

First Affiliated Hospital of Dalian Medical University
2024

Dalian Medical University
2024

Harbin Institute of Technology
2024

University of St Andrews
2024

Anhui University of Technology
2024

Beihang University
2023

Although deep neural networks have been widely applied to computer vision problems, extending them into multiview depth estimation is non-trivial. In this paper, we present MVDepthNet, a convolutional network solve the problem given several image-pose pairs from localized monocular camera in neighbor viewpoints. Multiview observations are encoded cost volume and then combined with reference image estimate map using an encoder-decoder network. By encoding information volume, our method...

10.1109/3dv.2018.00037 article EN 2021 International Conference on 3D Vision (3DV) 2018-09-01

Trajectory replanning for quadrotors is essential to enable fully autonomous flight in unknown environments. Hierarchical motion planning frameworks, which combine path with parameterization, are popular due their time efficiency. However, the cannot properly deal nonstatic initial states of quadrotor, may result nonsmooth or even dynamically infeasible trajectories. In this article, we present an efficient kinodynamic framework by exploiting advantageous properties B-spline, facilitates...

10.1109/tro.2019.2926390 article EN IEEE Transactions on Robotics 2019-08-23

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation can only handle single camera model and unable perform mixed-data training due ambiguity. Meanwhile, SOTA trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover...

10.1109/iccv51070.2023.00830 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

In this paper, we propose a novel dense surfel mapping system that scales well in different environments with only CPU computation. Using sparse SLAM to estimate camera poses, the proposed can fuse intensity images and depth into globally consistent model. The is carefully designed so it build from room-scale urban-scale using RGB-D cameras, stereo cameras or even monocular camera. First, superpixels extracted both are used model surfels system. superpixel-based make our method runtime...

10.1109/icra.2019.8794101 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated areas, leading to corrupted estimations. Many multi-frame methods handle areas by identifying them with explicit masks and compensating cues monocular represented as local or features. The improvements are limited due uncontrolled quality of underutilized benefits fusion two types cues. In...

10.1109/cvpr52729.2023.02063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We focus on a replanning scenario for quadrotors where considering time efficiency, non-static initial state and dynamical feasibility is of great significance. propose real-time B-spline based kinodynamic (RBK) search algorithm, which transforms position-only shortest path (such as A* Dijkstra) into an efficient search, by exploring the properties parameterization. The RBK greedy produces dynamically feasible time-parameterized trajectory efficiently, facilitates quadrotor. To cope with...

10.1109/icra.2018.8463188 preprint EN 2018-05-01

Abstract Safety is undoubtedly the most fundamental requirement for any aerial robotic application. It essential to equip robots with omnidirectional perception coverage ensure safe navigation in complex environments. In this paper, we present a light‐weight and low‐cost system, which consists of two ultrawide field‐of‐view (FOV) fisheye cameras inertial measurement unit (IMU). The goal system achieve spherical sensing minimum sensor suite. are mounted rigidly facing upward downward...

10.1002/rob.21946 article EN Journal of Field Robotics 2020-02-25

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical accurate 3D recovery. Depth estimation, though complementary, present distinct challenges. State-of-the-art monocular methods achieve generalization through affine-invariant depths, but fail to recover real-world scale. Conversely, current techniques struggle with performance due insufficient labeled data. propose targeted solutions both...

10.1109/tpami.2024.3444912 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-08-16

In this letter, we propose a novel motion planning framework for quadrotor teach-and-repeat applications. Instead of controlling the drone to precisely follow teaching path, our method converts an arbitrary jerky human-piloted trajectory topologically equivalent one, which is guaranteed be safe, smooth, and kinodynamically feasible with expected aggressiveness. Our proposed optimizes in both spatial temporal aspects. layer, flight corridor found represent free space that path. Then,...

10.1109/lra.2019.2895110 article EN IEEE Robotics and Automation Letters 2019-01-24

Thermal defects of substation equipment have a great impact on the stability power systems. Temperature is crucial for thermal defect detection in infrared images. The traditional methods, which low efficiency and poor accuracy, record temperature images manually. In this study, method based using convolutional neural network (CNN) proposed. Firstly, improved pre-processing applied to reduce background information, region interest located according contour position hence improving quality...

10.3390/electronics10161986 article EN Electronics 2021-08-18

In this paper, we propose a novel mapping method for robotic navigation. High-quality dense depth maps are estimated and fused into 3D reconstructions in real-time using single localized moving camera. The quadtree structure of the intensity image is used to reduce computation burden by estimating map multiple resolutions. Both quadtree-based pixel selection dynamic belief propagation proposed speed up process: pixels selected optimized with resource according their levels quadtree. Solved...

10.1109/iros.2018.8594101 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

We propose a learning-based method <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images camera with known intrinsic calibration, our network estimates relative poses the map of source image. The core contribution proposed is threefold. First, tailored for static scenes jointly optical flow motion. By joint...

10.1109/lra.2020.2975750 article EN IEEE Robotics and Automation Letters 2020-02-21

Rice sheath blight is one of the main diseases in rice production. The traditional detection method, which needs manual recognition, usually inefficient and slow. In this study, a recognition method for identifying based on backpropagation (BP) neural network posed. Firstly, sample image smoothed by median filtering histogram equalization, edge lesion segmented using Sobel operator, largely reduces background information significantly improves quality. Then, corresponding feature parameters...

10.3390/electronics10232907 article EN Electronics 2021-11-24

Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such typically need train separate models for different scene types and are impractical when the type unknown in advance. One of underlying problems limited scalability data construction pipelines, which limits diversity standard image datasets. To address this problem, we propose GIM, self-training...

10.48550/arxiv.2402.11095 preprint EN arXiv (Cornell University) 2024-02-16

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from single image, which is crucial 3D recovery. While are geometrically related highly complimentary, they present distinct challenges. SoTA monocular methods achieve generalization by learning affine-invariant depths, cannot recover real-world metrics. Meanwhile, have limited performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions...

10.1109/tpami.2024.3444912 preprint EN arXiv (Cornell University) 2024-03-21

To investigate the legibility of Chinese characters' font size, text background opacity, and stroke for elderly in virtual reality, we recruited old young participants to conduct experiments with VR used eye-tracking technology record data task completion time error rate. After analysis, concluded that minimum recognition size is 30 dmm, best 60 which 20 40 dmm people. The style has a significant effect on people (p = 0.000*). Besides, sizes smaller than bigger 50 strokes over 50%...

10.1080/00140139.2024.2392798 article EN Ergonomics 2024-08-17

This paper discusses the results for second edition of Monocular Depth Estimation Challenge (MDEC). was open to methods using any form supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge based around SYNS-Patches dataset, which features a wide diversity environments with high-quality dense ground-truth. includes complex natural environments, e.g. forests fields, are greatly underrepresented in current benchmarks.The received eight unique...

10.1109/cvprw59228.2023.00308 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

In view of the problems low accuracy and difficult recognition gesture detection algorithms in complex backgrounds. this paper, a method based on improved YOLOv5 backgrounds is studied. Firstly, to ensure that network focuses more effective channel features background images, SE attention mechanism introduced into both main neck network. Subsequently, without significantly increasing computational complexity, BiFPN module integrated better facilitate multi-scale feature fusion. Finally,...

10.1109/nnice61279.2024.10498303 article EN 2024-01-19

Graffiti on buildings and bridges are oftentimes an eyesore. Those road symbol signs can even pose safety risks to motorists. Not only is graffiti cleaning costly, it also disrupts normal traffic. a widespread problem in many cities the U.S. This paper proposes machine learning approach unmanned aerial vehicle (UAV) detection removal. Our solution builds smart city framework. The proposed expected lower cost minimize impact

10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00337 article EN 2019-08-01

With the promotion of bill exchange system throughout world, use VAT invoices has exploded. In order to solve problems low efficiency, high error rate and labor intensity manual entry electronic invoice, a method recognizing invoice information based on computer vision was proposed. Firstly, image preprocessed, tilt correction implemented by local adaptive threshold Hough transform. Then key area segmented target object taken out projection method. Finally, characters were recognized OCR...

10.1109/ccis48116.2019.9073749 article EN 2019-12-01

ImageNet-pretrained networks have been widely used in transfer learning for monocular depth estimation. These pretrained are trained with classification losses which only semantic information is exploited while spatial ignored. However, both and important per-pixel In this paper, we design a novel self-supervised geometric pretraining task that tailored estimation using uncalibrated videos. The designed decouples the structure from input videos by simple yet effective conditional...

10.1109/icra40945.2020.9196847 article EN 2020-05-01

This paper presents a probabilistic approach for online dense reconstruction using single monocular camera moving through the environment. Compared to spatial stereo, depth estimation from motion stereo is challenging due insufficient parallaxes, visual scale changes, pose errors, etc. We utilize both and temporal correlations of consecutive estimates increase robustness accuracy estimation. An online, recursive, scheme compute estimates, with corresponding covariances inlier probability...

10.1109/iros.2018.8593618 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01
Coming Soon ...