- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Optical measurement and interference techniques
- Image Processing Techniques and Applications
- Advanced Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Robotic Path Planning Algorithms
- Image Enhancement Techniques
- 3D Surveying and Cultural Heritage
- EEG and Brain-Computer Interfaces
- Gait Recognition and Analysis
- Hand Gesture Recognition Systems
- Image Retrieval and Classification Techniques
- Vehicle License Plate Recognition
- Neural Networks and Applications
- Currency Recognition and Detection
- Distributed Control Multi-Agent Systems
- Constructed Wetlands for Wastewater Treatment
- Generative Adversarial Networks and Image Synthesis
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Wastewater Treatment and Nitrogen Removal
- Advanced Computing and Algorithms
- Image and Object Detection Techniques
- Micro and Nano Robotics
Aviation Industry Corporation of China (China)
2024
East China University of Science and Technology
2024
University of Hong Kong
2018-2024
Hong Kong University of Science and Technology
2018-2024
First Affiliated Hospital of Dalian Medical University
2024
Dalian Medical University
2024
Harbin Institute of Technology
2024
University of St Andrews
2024
Anhui University of Technology
2024
Beihang University
2023
Although deep neural networks have been widely applied to computer vision problems, extending them into multiview depth estimation is non-trivial. In this paper, we present MVDepthNet, a convolutional network solve the problem given several image-pose pairs from localized monocular camera in neighbor viewpoints. Multiview observations are encoded cost volume and then combined with reference image estimate map using an encoder-decoder network. By encoding information volume, our method...
Trajectory replanning for quadrotors is essential to enable fully autonomous flight in unknown environments. Hierarchical motion planning frameworks, which combine path with parameterization, are popular due their time efficiency. However, the cannot properly deal nonstatic initial states of quadrotor, may result nonsmooth or even dynamically infeasible trajectories. In this article, we present an efficient kinodynamic framework by exploiting advantageous properties B-spline, facilitates...
Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation can only handle single camera model and unable perform mixed-data training due ambiguity. Meanwhile, SOTA trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover...
In this paper, we propose a novel dense surfel mapping system that scales well in different environments with only CPU computation. Using sparse SLAM to estimate camera poses, the proposed can fuse intensity images and depth into globally consistent model. The is carefully designed so it build from room-scale urban-scale using RGB-D cameras, stereo cameras or even monocular camera. First, superpixels extracted both are used model surfels system. superpixel-based make our method runtime...
Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated areas, leading to corrupted estimations. Many multi-frame methods handle areas by identifying them with explicit masks and compensating cues monocular represented as local or features. The improvements are limited due uncontrolled quality of underutilized benefits fusion two types cues. In...
We focus on a replanning scenario for quadrotors where considering time efficiency, non-static initial state and dynamical feasibility is of great significance. propose real-time B-spline based kinodynamic (RBK) search algorithm, which transforms position-only shortest path (such as A* Dijkstra) into an efficient search, by exploring the properties parameterization. The RBK greedy produces dynamically feasible time-parameterized trajectory efficiently, facilitates quadrotor. To cope with...
Abstract Safety is undoubtedly the most fundamental requirement for any aerial robotic application. It essential to equip robots with omnidirectional perception coverage ensure safe navigation in complex environments. In this paper, we present a light‐weight and low‐cost system, which consists of two ultrawide field‐of‐view (FOV) fisheye cameras inertial measurement unit (IMU). The goal system achieve spherical sensing minimum sensor suite. are mounted rigidly facing upward downward...
We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical accurate 3D recovery. Depth estimation, though complementary, present distinct challenges. State-of-the-art monocular methods achieve generalization through affine-invariant depths, but fail to recover real-world scale. Conversely, current techniques struggle with performance due insufficient labeled data. propose targeted solutions both...
In this letter, we propose a novel motion planning framework for quadrotor teach-and-repeat applications. Instead of controlling the drone to precisely follow teaching path, our method converts an arbitrary jerky human-piloted trajectory topologically equivalent one, which is guaranteed be safe, smooth, and kinodynamically feasible with expected aggressiveness. Our proposed optimizes in both spatial temporal aspects. layer, flight corridor found represent free space that path. Then,...
Thermal defects of substation equipment have a great impact on the stability power systems. Temperature is crucial for thermal defect detection in infrared images. The traditional methods, which low efficiency and poor accuracy, record temperature images manually. In this study, method based using convolutional neural network (CNN) proposed. Firstly, improved pre-processing applied to reduce background information, region interest located according contour position hence improving quality...
In this paper, we propose a novel mapping method for robotic navigation. High-quality dense depth maps are estimated and fused into 3D reconstructions in real-time using single localized moving camera. The quadtree structure of the intensity image is used to reduce computation burden by estimating map multiple resolutions. Both quadtree-based pixel selection dynamic belief propagation proposed speed up process: pixels selected optimized with resource according their levels quadtree. Solved...
We propose a learning-based method <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images camera with known intrinsic calibration, our network estimates relative poses the map of source image. The core contribution proposed is threefold. First, tailored for static scenes jointly optical flow motion. By joint...
Rice sheath blight is one of the main diseases in rice production. The traditional detection method, which needs manual recognition, usually inefficient and slow. In this study, a recognition method for identifying based on backpropagation (BP) neural network posed. Firstly, sample image smoothed by median filtering histogram equalization, edge lesion segmented using Sobel operator, largely reduces background information significantly improves quality. Then, corresponding feature parameters...
Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such typically need train separate models for different scene types and are impractical when the type unknown in advance. One of underlying problems limited scalability data construction pipelines, which limits diversity standard image datasets. To address this problem, we propose GIM, self-training...
We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from single image, which is crucial 3D recovery. While are geometrically related highly complimentary, they present distinct challenges. SoTA monocular methods achieve generalization by learning affine-invariant depths, cannot recover real-world metrics. Meanwhile, have limited performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions...
To investigate the legibility of Chinese characters' font size, text background opacity, and stroke for elderly in virtual reality, we recruited old young participants to conduct experiments with VR used eye-tracking technology record data task completion time error rate. After analysis, concluded that minimum recognition size is 30 dmm, best 60 which 20 40 dmm people. The style has a significant effect on people (p = 0.000*). Besides, sizes smaller than bigger 50 strokes over 50%...
This paper discusses the results for second edition of Monocular Depth Estimation Challenge (MDEC). was open to methods using any form supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge based around SYNS-Patches dataset, which features a wide diversity environments with high-quality dense ground-truth. includes complex natural environments, e.g. forests fields, are greatly underrepresented in current benchmarks.The received eight unique...
In view of the problems low accuracy and difficult recognition gesture detection algorithms in complex backgrounds. this paper, a method based on improved YOLOv5 backgrounds is studied. Firstly, to ensure that network focuses more effective channel features background images, SE attention mechanism introduced into both main neck network. Subsequently, without significantly increasing computational complexity, BiFPN module integrated better facilitate multi-scale feature fusion. Finally,...
Graffiti on buildings and bridges are oftentimes an eyesore. Those road symbol signs can even pose safety risks to motorists. Not only is graffiti cleaning costly, it also disrupts normal traffic. a widespread problem in many cities the U.S. This paper proposes machine learning approach unmanned aerial vehicle (UAV) detection removal. Our solution builds smart city framework. The proposed expected lower cost minimize impact
With the promotion of bill exchange system throughout world, use VAT invoices has exploded. In order to solve problems low efficiency, high error rate and labor intensity manual entry electronic invoice, a method recognizing invoice information based on computer vision was proposed. Firstly, image preprocessed, tilt correction implemented by local adaptive threshold Hough transform. Then key area segmented target object taken out projection method. Finally, characters were recognized OCR...
ImageNet-pretrained networks have been widely used in transfer learning for monocular depth estimation. These pretrained are trained with classification losses which only semantic information is exploited while spatial ignored. However, both and important per-pixel In this paper, we design a novel self-supervised geometric pretraining task that tailored estimation using uncalibrated videos. The designed decouples the structure from input videos by simple yet effective conditional...
This paper presents a probabilistic approach for online dense reconstruction using single monocular camera moving through the environment. Compared to spatial stereo, depth estimation from motion stereo is challenging due insufficient parallaxes, visual scale changes, pose errors, etc. We utilize both and temporal correlations of consecutive estimates increase robustness accuracy estimation. An online, recursive, scheme compute estimates, with corresponding covariances inlier probability...