Yanwei Pang

ORCID: 0000-0001-6670-3727
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Video Surveillance and Tracking Methods
  • Face and Expression Recognition
  • Image Retrieval and Classification Techniques
  • Image Enhancement Techniques
  • Anomaly Detection Techniques and Applications
  • Human Pose and Action Recognition
  • Advanced MRI Techniques and Applications
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • Video Analysis and Summarization
  • Visual Attention and Saliency Detection
  • Remote-Sensing Image Classification
  • Medical Imaging Techniques and Applications
  • COVID-19 diagnosis using AI
  • Advanced Image Fusion Techniques
  • Advanced Memory and Neural Computing
  • Image Processing Techniques and Applications
  • Machine Learning and ELM
  • Gait Recognition and Analysis
  • Industrial Vision Systems and Defect Detection
  • EEG and Brain-Computer Interfaces

Tianjin University
2016-2025

Beijing Academy of Artificial Intelligence
2023-2024

Shanghai Artificial Intelligence Laboratory
2023-2024

University of Warwick
2023

Shanghai Center for Brain Science and Brain-Inspired Technology
2022-2023

Inception Institute of Artificial Intelligence
2019

Nokia (China)
2012

Birkbeck, University of London
2008

University of Science and Technology of China
2003-2006

Institute of Automation
2006

Images captured under water are usually degraded due to the effects of absorption and scattering. Degraded underwater images show some limitations when they used for display analysis. For example, with low contrast color cast decrease accuracy rate object detection marine biology recognition. To overcome those limitations, a systematic image enhancement method, which includes an dehazing algorithm algorithm, is proposed. Built on minimum information loss principle, effective proposed restore...

10.1109/tip.2016.2612882 article EN IEEE Transactions on Image Processing 2016-09-22

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where input is sequence original frames, and output keyshot sequence. Our key idea to learn deep network with attention mechanism mimic way selecting keyshots human. To this end, we propose novel framework named attentive encoder-decoder networks for (AVS), in which encoder uses bidirectional long short-term memory (BiLSTM) encode contextual information among...

10.1109/tcsvt.2019.2904996 article EN IEEE Transactions on Circuits and Systems for Video Technology 2019-03-14

10.1016/j.sigpro.2010.08.010 article EN Signal Processing 2010-09-16

Travel route planning is an important step for a tourist to prepare his/her trip. As common scenario, usually asks the following questions when he/she trip in unfamiliar place: 1) Are there any travel suggestions one-day or three-day Beijing? 2) What most popular path within Forbidden City? To facilitate tourist's planning, this paper, we target at solving problem of automatic planning. We propose leverage existing clues recovered from 20 million geo-tagged photos collected www.panoramio.com...

10.1145/1873951.1873972 article EN Proceedings of the 30th ACM International Conference on Multimedia 2010-10-25

Convolutional Neural Network (CNN) based methods generally take crowd counting as a regression task by outputting densities. They learn the mapping between image contents and density distributions. Though having achieved promising results, these data-driven networks are prone to overestimate or underestimate people counts of regions with different patterns, which degrades whole count accuracy. To overcome this problem, we propose an approach alleviate performance differences in regions....

10.1109/cvpr42600.2020.00476 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved standard pedestrians, the performance heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other and inter-class caused by objects, such as cars bicycles. These in a multitude of occlusion patterns. We propose an approach for pedestrian with following contributions. First, we introduce novel...

10.1109/iccv.2019.00507 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Network in network (NiN) is an effective instance and important extension of deep convolutional neural consisting alternating layers pooling layers. Instead using a linear filter for convolution, NiN utilizes shallow multilayer perceptron (MLP), nonlinear function, to replace the filter. Because powerfulness MLP convolutions spatial domain, has stronger ability feature representation hence results better recognition performance. However, itself consists fully connected that give rise large...

10.1109/tnnls.2017.2676130 article EN IEEE Transactions on Neural Networks and Learning Systems 2017-03-16

We propose a novel two-stage detection method, D2Det, that collectively addresses both precise localization and accurate classification. For localization, we introduce dense local regression predicts multiple box offsets for an object proposal. Different from traditional keypoint-based employed in detectors, our is not limited to quantized set of keypoints within fixed region has the ability regress position-sensitive real number offsets, leading more localization. The further improved by...

10.1109/cvpr42600.2020.01150 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The cerebellum plays a vital role in motor learning and control with supervised capability, while neuromorphic engineering devises diverse approaches to high-performance computation inspired by biological neural systems. This article presents large-scale cerebellar network model for learning, as well cerebellum-inspired architecture map the anatomical structure into model. Our multinucleus its underpinning contain approximately 3.5 million neurons, upscaling state-of-the-art designs over 34...

10.1109/tnnls.2021.3057070 article EN IEEE Transactions on Neural Networks and Learning Systems 2021-02-23

Pedestrian detection is an important but challenging problem in computer vision, especially human-centric tasks. Over the past decade, significant improvement has been witnessed with help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances pedestrian detection. First, provide detailed review single-spectral that includes based methods approaches. For methods, extensive approaches find large freedom degrees shape space have better performance....

10.1109/tpami.2021.3076733 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-04-30

Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...

10.1109/tmm.2023.3241517 article EN IEEE Transactions on Multimedia 2023-01-01

Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual scenes. While promising results have been obtained, several issues persist. In particular: 1) it is hard to find the optimal parameters artificially designed modules (e.g., non-maximum suppression (NMS)) causing redundancies and fewer interactions benefit two sub-tasks RoI detection captioning; 2) absence a multi-scale decoder in current methods hinders acquisition scale-invariant features, thus...

10.1109/tmm.2024.3369863 article EN IEEE Transactions on Multimedia 2024-01-01

As a fundamental and challenging task in bridging language vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to given query from other modality, its key challenge is measure semantic similarity across different modalities. Although significant progress has been achieved, existing approaches typically suffer two major limitations: (1) It hurts accuracy of representation by directly exploiting bottom-up attention based...

10.1109/tip.2023.3348297 article EN IEEE Transactions on Image Processing 2024-01-01

Tensor analysis plays an important role in modern image and vision computing problems. Most of the existing tensor approaches are based on Frobenius norm, which makes them sensitive to outliers. In this paper, we propose L1-norm-based (TPCA-L1), is robust Experimental results upon face other datasets demonstrate advantages proposed approach.

10.1109/tcsvt.2009.2020337 article EN IEEE Transactions on Circuits and Systems for Video Technology 2009-04-08

With the prosperity of tourism and Web 2.0 technologies, more people have willingness to share their travel experiences on (e.g., weblogs, forums, or communities). These so-called travelogues contain rich information, particularly including location-representative knowledge such as attractions Golden Gate Bridge), styles beach, history), activities diving, surfing). The information in can greatly facilitate other tourists' trip planning, if it be correctly extracted summarized. However,...

10.1145/1772690.1772732 article EN 2010-04-26

Restoring underwater image from a single is know to be ill-posed, and some assumptions made in previous methods are not suitable for many situations. In this paper, we propose method based on blue-green channels dehazing red channel correction restoration. Firstly, recovered via algorithm an extension modification of Dark Channel Prior algorithm. Then, corrected following the Gray-World assumption theory. Finally, order resolve problem which regions may look too dim or bright, adaptive...

10.1109/icassp.2016.7471973 article EN 2016-03-01

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete parsing. We formulate approach as a information fusion framework. Our model assembles from three inference processes over hierarchy: direct (directly predicting each part body using image information), bottom-up (assembling knowledge constituent parts), top-down (leveraging context parent nodes). The inferences explicitly decompositional relations in bodies, respectively....

10.1109/iccv.2019.00580 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Human-object interaction detection is an important and relatively new class of visual relationship tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization recognition. Despite showing progress, these only rely on appearances humans objects overlook available context information, crucial capturing subtle interactions between them. We propose a contextual attention framework human-object detection. Our approach leverages by...

10.1109/iccv.2019.00579 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Aggregating multi-level features is essential for capturing multi-scale context information precise scene semantic segmentation. However, the improvement by directly fusing shallow and deep becomes limited as gap between them increases. To solve this problem, we explore two strategies robust feature fusion. One enhancing using a enhancement module (SeEM) to alleviate features. The other strategy attention, which involves discovering complementary (i.e., boundary information) from low-level...

10.1109/iccv.2019.00433 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01
Coming Soon ...