Peng Wang

ORCID: 0000-0001-7689-3405
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Topic Modeling
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Visual Attention and Saliency Detection
  • Video Analysis and Summarization
  • Video Surveillance and Tracking Methods
  • Anomaly Detection Techniques and Applications
  • Natural Language Processing Techniques
  • COVID-19 diagnosis using AI
  • Adversarial Robustness in Machine Learning
  • Advanced Graph Neural Networks
  • Machine Learning in Materials Science
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Cloud Computing and Resource Management
  • Advanced Computational Techniques and Applications
  • Handwritten Text Recognition Techniques
  • Optical measurement and interference techniques
  • Advanced Image Processing Techniques
  • Machine Learning and Algorithms
  • CCD and CMOS Imaging Sensors
  • Medical Image Segmentation Techniques

Northwestern Polytechnical University
2019-2024

Lenovo (China)
2024

Qufu Normal University
2023

Shandong Academy of Sciences
2023

Qilu University of Technology
2023

Shandong Institute of Automation
2022-2023

Chinese Academy of Sciences
2022

Space Engineering University
2021

Deepblue Technology (China)
2021

Baidu (China)
2019

In this work, we tackle the problem of instance segmentation, task simultaneously solving object detection and semantic segmentation. Towards goal, present a model, called MaskLab, which produces three outputs: box detection, direction prediction. Building on top Faster-RCNN detector, predicted boxes provide accurate localization instances. Within each region interest, MaskLab performs foreground/background segmentation by combining Semantic assists model in distinguishing between objects...

10.1109/cvpr.2018.00422 article EN 2018-06-01

Recognizing irregular text in natural scene images is challenging due to the large variance appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, some extent, increase difficulty algorithm implementation data collection. In this work, we propose an easy-to-implement strong baseline for recognition, using offthe-shelf neural network components only word-level annotations. It...

10.1609/aaai.v33i01.33018610 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

The success of deep neural networks relies on significant architecture engineering. Recently search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount computational resources, e.g., few thousand GPU-days. To date, challenging vision tasks object detection, NAS, especially fast versions is less studied. Here we propose the decoder structure detectors...

10.1109/cvpr42600.2020.01196 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which able to find efficient accurate networks different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In we renovate...

10.1109/cvpr46437.2021.00300 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Mengze Li, Tianbao Wang, Haoyu Zhang, Shengyu Zhou Zhao, Jiaxu Miao, Wenqiao Wenming Tan, Jin Peng Shiliang Pu, Fei Wu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.596 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

10.1109/cvpr52733.2024.02563 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

The success of deep neural networks relies on significant architecture engineering. Recently search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount computational resources, e.g., few thousand GPU-days. To date, challenging vision tasks object detection, NAS, especially fast versions is less studied. Here we propose the decoder structure detectors...

10.48550/arxiv.1906.04423 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into surroundings, has recently drawn increasing research efforts in field of computer vision. In practice, success deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, provides rich context information, and (ii) An effective fusion strategy, aggregates multi-level features for accurate COD. Motivated these observations, this paper,...

10.48550/arxiv.2101.05687 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In this work, we tackle the problem of instance segmentation, task simultaneously solving object detection and semantic segmentation. Towards goal, present a model, called MaskLab, which produces three outputs: box detection, direction prediction. Building on top Faster-RCNN detector, predicted boxes provide accurate localization instances. Within each region interest, MaskLab performs foreground/background segmentation by combining Semantic assists model in distinguishing between objects...

10.48550/arxiv.1712.04837 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Due to the attractive potential in avoiding elaborate definition of anchor attributes, anchor-free-based deep learning approaches are promising for object detection remote sensing imagery. CornerNet is one most representative methods approaches. However, it can be observed distinctly from visual inspection that limited grouping keypoints, which significantly impacts performance. To address above problem, a novel and effective approach, called GroupNet, presented this paper, adaptively groups...

10.1016/j.cja.2021.09.016 article EN cc-by-nc-nd Chinese Journal of Aeronautics 2021-10-26

In this paper, we propose an improved YOLOv5 pedestrian detection algorithm to solve the problems of target missing and low accuracy in ROS platform. By adding a small layer 160*160, method improves performance model effectively reduces false rate occluded pedestrians, especially heavily targets. order further improve accuracy, it fuses underlying features backbone network achieve path aggregation with multi-feature fusion. Furthermore, Soft-DIoU-NMS is used for post-detection processing...

10.1109/icras57898.2023.10221598 article EN 2023-06-16

In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes 2,644 identities appearing in both the UAVs and ground surveillance cameras. To our knowledge, is first cross-platform intelligent applications, where could work as powerful complement more realistically simulate actual scenarios, cameras are fixed about 2 meters above ground, while capture videos persons at different location,...

10.1145/3581783.3612105 preprint EN 2023-10-26

In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements dataset benchmarks. Notably, established COCO benchmark propelled development of modern detection and segmentation systems. However, seen comparatively slow improvement over last decade. Originally equipped with coarse polygon annotations for thing instances, it gradually incorporated superpixel stuff regions, which were subsequently heuristically amalgamated yield...

10.48550/arxiv.2404.08639 preprint EN arXiv (Cornell University) 2024-04-12

Neural Radiance Fields (NeRF) have shown impressive results in 3D reconstruction and generating novel views. A key challenge within NeRF is the editing of reconstructed scenes, such as object removal, which requires maintaining consistency across multiple views ensuring high-quality synthesised perspectives. Previous studies incorporated depth priors, typically from LiDAR or sparse measurements provided by COLMAP, to improve performance removal NeRF. However, these methods are either costly...

10.48550/arxiv.2405.00630 preprint EN arXiv (Cornell University) 2024-05-01

Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding. Consequently, it serves as an ideal testing ground for Multi-modal Large Language Models (MLLMs). In pursuit this goal, we have established new REC dataset characterized by two key features: Firstly, designed with controllable varying levels difficulty, necessitating multi-level fine-grained...

10.48550/arxiv.2409.14750 preprint EN arXiv (Cornell University) 2024-09-23

10.18653/v1/2024.emnlp-main.864 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01
Coming Soon ...