Peng Wang

ORCID: 0000-0002-1265-0233
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Advanced Image Processing Techniques
  • Human Pose and Action Recognition
  • Optical measurement and interference techniques
  • Image Enhancement Techniques
  • 3D Surveying and Cultural Heritage
  • Video Surveillance and Tracking Methods
  • Multimodal Machine Learning Applications
  • Computer Graphics and Visualization Techniques
  • Image Processing Techniques and Applications
  • Retinal Imaging and Analysis
  • Image Retrieval and Classification Techniques
  • Remote Sensing and LiDAR Applications
  • 3D Shape Modeling and Analysis
  • Handwritten Text Recognition Techniques
  • Image and Object Detection Techniques
  • Robotic Path Planning Algorithms
  • Domain Adaptation and Few-Shot Learning
  • Image and Video Stabilization
  • Parallel Computing and Optimization Techniques
  • Digital Image Processing Techniques
  • Advanced Optical Sensing Technologies

Shenyang Institute of Automation
2023

Chinese Academy of Sciences
2018-2023

University of Hong Kong
2022-2023

Nanyang Technological University
2023

Qingdao University
2023

Megvii (China)
2022

Vi Technology (United States)
2022

China Academy of Railway Sciences
2022

Nanjing University of Information Science and Technology
2022

Zhejiang University
2021

Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing road and understanding objects, which enable vehicles to reason act. However, large scale data set training system evaluation is still bottleneck developing robust perception models. In this paper, we present ApolloScape dataset [1] its applications autonomous driving. Compared with...

10.1109/tpami.2019.2926463 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-07-07

Scene parsing aims to assign a class (semantic) label for each pixel in an image. It is comprehensive analysis of Given the rise autonomous driving, pixel-accurate environmental perception expected be key enabling technical piece. However, providing large scale dataset design and evaluation scene algorithms, particular outdoor scenes, has been difficult. The per-pixel labelling process prohibitively expensive, limiting existing ones. In this paper, we present large-scale open dataset,...

10.1109/cvprw.2018.00141 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

It has been recently shown that a convolutional neural network can learn optical flow estimation with unsupervised learning. However, the performance of methods still relatively large gap compared to its supervised counterpart. Occlusion and motion are some major factors limit current learning methods. In this work we introduce new method which models occlusion explicitly warping way facilitates motion. Our shows promising results on Flying Chairs, MPI-Sintel KITTI benchmark datasets....

10.1109/cvpr.2018.00513 preprint EN 2018-06-01

Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant progress recently. Current state-of-the-art (SoTA) methods treat the two tasks independently. One typical assumption of existing depth estimation is that scenes contain no independent moving objects. while object could be easily modeled using flow. In this paper, we propose address as whole, i.e., jointly understand...

10.1109/tpami.2019.2930258 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-07-23

In this paper, we propose the convolutional spatial propagation network (CSPN) and demonstrate its effectiveness for various depth estimation tasks. CSPN is a simple efficient linear model, where performed with manner of recurrent operations, in which affinity among neighboring pixels learned through deep neural (CNN). Compare to previous state-of-the-art (SOTA) i.e., networks (SPN), 2 5× faster practice. We concatenate variants SOTA networks, significantly improve accuracy. Specifically,...

10.1109/tpami.2019.2947374 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-10-15

Human pose estimation and semantic part segmentation are two complementary tasks in computer vision. In this paper, we propose to solve the jointly for natural multi-person images, which estimated provides object-level shape prior regularize segments while part-level constrain variation of locations. Specifically, first train fully convolutional neural networks (FCNs), namely Pose FCN Part FCN, provide initial joint potential potential. Then, refine location, types potentials fused with a...

10.1109/cvpr.2017.644 preprint EN 2017-07-01

Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network is attracting significant attention. In this paper, we introduce "3D as-smooth-as-possible (3D-ASAP)" prior inside the pipeline, which enables joint estimation of edges and scene, yielding results with improvement accuracy for fine detailed structures. Specifically, define 3D-ASAP requiring that any two points recovered from an should lie on existing planar surface if no other cues...

10.1109/cvpr.2018.00031 preprint EN 2018-06-01

Depth Completion deals with the problem of converting a sparse depth map to dense one, given corresponding color image. Convolutional spatial propagation network (CSPN) is one state-of-the-art (SoTA) methods completion, which recovers structural details scene. In this paper, we propose CSPN++, further improves its effectiveness and efficiency by learning adaptive convolutional kernel sizes number iterations for propagation, thus context computational resource needed at each pixel could be...

10.1609/aaai.v34i07.6635 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption. UnOS significantly outperforms other state-of-the-art (SOTA) approaches that treated two tasks independently. Specifically, given consecutive image pairs from a video, estimates per-pixel images, camera ego-motion with three parallel CNNs. Based...

10.1109/cvpr.2019.00826 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties (e.g. translation, rotation shape) of a moving or parked vehicle on the road. This task, while critical, still under-researched in computer vision community – partially owing lack large scale fully-annotated car database suitable for autonomous research. In this paper, we contribute first instance understanding ApolloCar3D. The dataset contains 5,277 images...

10.1109/cvpr.2019.00560 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We propose Super Odometry, a high-precision multi-modal sensor fusion framework, providing simple but effective way to fuse multiple sensors such as LiDAR, camera, and IMU achieve robust state estimation in perceptually-degraded environments. Different from traditional sensor-fusion methods, Odometry employs an IMU-centric data processing pipeline, which combines the advantages of loosely coupled methods with tightly recovers motion coarse-to-fine manner. The proposed framework is composed...

10.1109/iros51168.2021.9635862 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021-09-27

This paper presents a novel grid-based NeRF called F <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> - (Fast-Free-NeRF) for view synthesis, which enables arbitrary input camera trajectories and only costs few minutes training. Existing fast training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed bounded scenes rely on space warping to handle unbounded scenes. two widely-used space-warping methods the...

10.1109/cvpr52729.2023.00404 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Learning to reconstruct depths from a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention in recent years, e.g. (Zhou et al. 2017). In this paper, we propose use surface normal representation for unsupervised depth estimation framework. Our estimated are constrained be compatible with predicted normals, yielding more robust geometry results. Specifically, formulate an edge-aware depth-normal consistency term, and solve it...

10.1609/aaai.v32i1.12257 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-27

Learning to reconstruct depths in a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention recent years. In this paper, we introduce surface normal representation for unsupervised depth estimation framework. Our estimated are constrained be compatible with predicted normals, yielding more robust geometry results. Specifically, formulate an edge-aware depth-normal consistency term, and solve it constructing depth-to-normal layer...

10.48550/arxiv.1711.03665 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Parsing human into semantic parts is crucial to human-centric analysis. In this paper, we propose a parsing pipeline that uses pose cues, e.g., estimates of joint locations, provide pose-guided segment proposals for parts. These are ranked using standard appearance deep-learned feature, and novel feature called pose-context. Then these selected assembled an And-Or graph output parse the person. The able deal with large variability due pose, choice clothing, etc. We evaluate our approach on...

10.1609/aaai.v30i1.10460 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-03-05

For applications such as augmented reality, autonomous driving, self-localization/camera pose estimation and scene parsing are crucial technologies. In this paper, we propose a unified framework to tackle these two problems simultaneously. The uniqueness of our design is sensor fusion scheme which integrates camera videos, motion sensors (GPS/IMU), 3D semantic map in order achieve robustness efficiency the system. Specifically, first have an initial coarse obtained from consumer-grade...

10.1109/cvpr.2018.00614 article EN 2018-06-01

This article establishes a three-tier mobile edge computing (MEC) network, which takes into account the cooperation between unmanned aerial vehicles (UAVs). In this MEC we aim to minimize processing delay of tasks by jointly optimizing deployment UAVs and offloading decisions, while meeting capacity constraint UAVs. However, resulting optimization problem is nonconvex, cannot be solved general tools in an effective efficient way. To end, propose two-layer algorithm tackle non-convexity...

10.23919/jcc.2022.04.018 article EN China Communications 2022-04-01
Coming Soon ...