- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Advanced Image Processing Techniques
- Human Pose and Action Recognition
- Optical measurement and interference techniques
- Image Enhancement Techniques
- 3D Surveying and Cultural Heritage
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Computer Graphics and Visualization Techniques
- Image Processing Techniques and Applications
- Retinal Imaging and Analysis
- Image Retrieval and Classification Techniques
- Remote Sensing and LiDAR Applications
- 3D Shape Modeling and Analysis
- Handwritten Text Recognition Techniques
- Image and Object Detection Techniques
- Robotic Path Planning Algorithms
- Domain Adaptation and Few-Shot Learning
- Image and Video Stabilization
- Parallel Computing and Optimization Techniques
- Digital Image Processing Techniques
- Advanced Optical Sensing Technologies
Shenyang Institute of Automation
2023
Chinese Academy of Sciences
2018-2023
University of Hong Kong
2022-2023
Nanyang Technological University
2023
Qingdao University
2023
Megvii (China)
2022
Vi Technology (United States)
2022
China Academy of Railway Sciences
2022
Nanjing University of Information Science and Technology
2022
Zhejiang University
2021
Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing road and understanding objects, which enable vehicles to reason act. However, large scale data set training system evaluation is still bottleneck developing robust perception models. In this paper, we present ApolloScape dataset [1] its applications autonomous driving. Compared with...
Scene parsing aims to assign a class (semantic) label for each pixel in an image. It is comprehensive analysis of Given the rise autonomous driving, pixel-accurate environmental perception expected be key enabling technical piece. However, providing large scale dataset design and evaluation scene algorithms, particular outdoor scenes, has been difficult. The per-pixel labelling process prohibitively expensive, limiting existing ones. In this paper, we present large-scale open dataset,...
It has been recently shown that a convolutional neural network can learn optical flow estimation with unsupervised learning. However, the performance of methods still relatively large gap compared to its supervised counterpart. Occlusion and motion are some major factors limit current learning methods. In this work we introduce new method which models occlusion explicitly warping way facilitates motion. Our shows promising results on Flying Chairs, MPI-Sintel KITTI benchmark datasets....
Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant progress recently. Current state-of-the-art (SoTA) methods treat the two tasks independently. One typical assumption of existing depth estimation is that scenes contain no independent moving objects. while object could be easily modeled using flow. In this paper, we propose address as whole, i.e., jointly understand...
In this paper, we propose the convolutional spatial propagation network (CSPN) and demonstrate its effectiveness for various depth estimation tasks. CSPN is a simple efficient linear model, where performed with manner of recurrent operations, in which affinity among neighboring pixels learned through deep neural (CNN). Compare to previous state-of-the-art (SOTA) i.e., networks (SPN), 2 5× faster practice. We concatenate variants SOTA networks, significantly improve accuracy. Specifically,...
Human pose estimation and semantic part segmentation are two complementary tasks in computer vision. In this paper, we propose to solve the jointly for natural multi-person images, which estimated provides object-level shape prior regularize segments while part-level constrain variation of locations. Specifically, first train fully convolutional neural networks (FCNs), namely Pose FCN Part FCN, provide initial joint potential potential. Then, refine location, types potentials fused with a...
Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network is attracting significant attention. In this paper, we introduce "3D as-smooth-as-possible (3D-ASAP)" prior inside the pipeline, which enables joint estimation of edges and scene, yielding results with improvement accuracy for fine detailed structures. Specifically, define 3D-ASAP requiring that any two points recovered from an should lie on existing planar surface if no other cues...
Depth Completion deals with the problem of converting a sparse depth map to dense one, given corresponding color image. Convolutional spatial propagation network (CSPN) is one state-of-the-art (SoTA) methods completion, which recovers structural details scene. In this paper, we propose CSPN++, further improves its effectiveness and efficiency by learning adaptive convolutional kernel sizes number iterations for propagation, thus context computational resource needed at each pixel could be...
In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption. UnOS significantly outperforms other state-of-the-art (SOTA) approaches that treated two tasks independently. Specifically, given consecutive image pairs from a video, estimates per-pixel images, camera ego-motion with three parallel CNNs. Based...
Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties (e.g. translation, rotation shape) of a moving or parked vehicle on the road. This task, while critical, still under-researched in computer vision community – partially owing lack large scale fully-annotated car database suitable for autonomous research. In this paper, we contribute first instance understanding ApolloCar3D. The dataset contains 5,277 images...
We propose Super Odometry, a high-precision multi-modal sensor fusion framework, providing simple but effective way to fuse multiple sensors such as LiDAR, camera, and IMU achieve robust state estimation in perceptually-degraded environments. Different from traditional sensor-fusion methods, Odometry employs an IMU-centric data processing pipeline, which combines the advantages of loosely coupled methods with tightly recovers motion coarse-to-fine manner. The proposed framework is composed...
This paper presents a novel grid-based NeRF called F <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> - (Fast-Free-NeRF) for view synthesis, which enables arbitrary input camera trajectories and only costs few minutes training. Existing fast training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed bounded scenes rely on space warping to handle unbounded scenes. two widely-used space-warping methods the...
Learning to reconstruct depths from a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention in recent years, e.g. (Zhou et al. 2017). In this paper, we propose use surface normal representation for unsupervised depth estimation framework. Our estimated are constrained be compatible with predicted normals, yielding more robust geometry results. Specifically, formulate an edge-aware depth-normal consistency term, and solve it...
Learning to reconstruct depths in a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention recent years. In this paper, we introduce surface normal representation for unsupervised depth estimation framework. Our estimated are constrained be compatible with predicted normals, yielding more robust geometry results. Specifically, formulate an edge-aware depth-normal consistency term, and solve it constructing depth-to-normal layer...
Parsing human into semantic parts is crucial to human-centric analysis. In this paper, we propose a parsing pipeline that uses pose cues, e.g., estimates of joint locations, provide pose-guided segment proposals for parts. These are ranked using standard appearance deep-learned feature, and novel feature called pose-context. Then these selected assembled an And-Or graph output parse the person. The able deal with large variability due pose, choice clothing, etc. We evaluate our approach on...
For applications such as augmented reality, autonomous driving, self-localization/camera pose estimation and scene parsing are crucial technologies. In this paper, we propose a unified framework to tackle these two problems simultaneously. The uniqueness of our design is sensor fusion scheme which integrates camera videos, motion sensors (GPS/IMU), 3D semantic map in order achieve robustness efficiency the system. Specifically, first have an initial coarse obtained from consumer-grade...
This article establishes a three-tier mobile edge computing (MEC) network, which takes into account the cooperation between unmanned aerial vehicles (UAVs). In this MEC we aim to minimize processing delay of tasks by jointly optimizing deployment UAVs and offloading decisions, while meeting capacity constraint UAVs. However, resulting optimization problem is nonconvex, cannot be solved general tools in an effective efficient way. To end, propose two-layer algorithm tackle non-convexity...