Youwei Pang

ORCID: 0000-0002-3950-0956
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Visual Attention and Saliency Detection
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Face Recognition and Perception
  • Video Surveillance and Tracking Methods
  • Olfactory and Sensory Function Studies
  • Image Enhancement Techniques
  • Image and Video Quality Assessment
  • Virtual Reality Applications and Impacts
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image Fusion Techniques
  • Robotics and Sensor-Based Localization
  • Advanced Vision and Imaging
  • Industrial Vision Systems and Defect Detection
  • Medical Image Segmentation Techniques
  • Spatial Cognition and Navigation
  • Software Engineering Techniques and Practices
  • Software Engineering Research
  • Machine Learning and ELM
  • Image Retrieval and Classification Techniques
  • Data Quality and Management
  • Advanced Battery Technologies Research
  • Remote-Sensing Image Classification
  • Tactile and Sensory Interactions
  • IoT-based Smart Home Systems

Dalian University of Technology
2020-2024

Deep-learning based salient object detection methods achieve great progress. However, the variable scale and unknown category of objects are challenges all time. These closely related to utilization multi-level multi-scale features. In this paper, we propose aggregate interaction modules integrate features from adjacent levels, in which less noise is introduced because only using small up-/down-sampling rates. To obtain more efficient integrated features, self-interaction embedded each...

10.1109/cvpr42600.2020.00943 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The recently proposed camouflaged object detection (COD) attempts to segment objects that are visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. Apart from high intrinsic similarity between the background, usually diverse scale, fuzzy appearance, even severely occluded. To deal with these problems, we propose a mixed-scale triplet network, Zoom- Net, mimics behavior of humans when observing vague images, i.e., zooming out....

10.1109/cvpr52688.2022.00220 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize convolution operation construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity constrains performance convolution-based a ceiling. In this work, we rethink these tasks from perspective global alignment transformation. Specifically, proposed view-mixed transformer (CAVER) cascades several integration units top-down transformer-based...

10.1109/tip.2023.3234702 article EN IEEE Transactions on Image Processing 2023-01-01

Accurate medical image segmentation is critical for early diagnosis. Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder. However, both the two operations easily generate plenty of redundant information, which will weaken complementarity between features, resulting inaccurate localization blurred edges lesions. To address this challenge, we propose a general multi-scale subtraction...

10.48550/arxiv.2303.10894 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Existing CNNs-Based RGB-D salient object detection (SOD) networks are all required to be pretrained on the ImageNet learn hierarchy features which helps provide a good initialization. However, collection and annotation of large-scale datasets time-consuming expensive. In this paper, we utilize self-supervised representation learning (SSL) design two pretext tasks: cross-modal auto-encoder depth-contour estimation. Our tasks require only few unlabeled perform pretraining, makes network...

10.1609/aaai.v36i3.20257 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Recent camouflaged object detection (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.Apart from the high intrinsic similarity between background, are usually diverse scale, fuzzy appearance, even severely occluded.To this end, we propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images videos, i.e., zooming out.Specifically, our approach employs...

10.1109/tpami.2024.3417329 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-06-21

Location and appearance are the key cues for video object segmentation. Many sources such as RGB, depth, optical flow static saliency can provide useful information about objects. However, existing approaches only utilize RGB or flow. In this paper, we propose a novel multi-source fusion network zero-shot With help of interoceptive spatial attention module (ISAM), importance each source is highlighted. Furthermore, design feature purification (FPM) to filter inter-source incompatible...

10.1145/3474085.3475192 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Shadow detection methods rely on multi-scale contrast, especially global information to locate shadows correctly. However, we observe that the camera image signal processor (ISP) tends preserve more local contrast by sacrificing during raw-to-sRGB conversion process. This often causes existing fail in scenes with high but low shadow regions. In this paper, propose a novel method detect from raw images. Our key idea is instead of performing many-to-one mapping like ISP process, can learn...

10.1109/iccv51070.2023.01167 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality sensors are expensive not be widely applied. While general produce noisy sparse information, which brings depth-based networks with irreversible interference. In this paper, we propose a novel multi-task multi-modal filtered transformer (MMFT) network...

10.1109/tip.2022.3222641 article EN IEEE Transactions on Image Processing 2022-01-01

Most salient object detection approaches use U-Net or feature pyramid networks (FPN) as their basic structures. These methods ignore two key problems when the encoder exchanges information with decoder: one is lack of interference control between them, other without considering disparity contributions different blocks. In this work, we propose a simple gated network (GateNet) to solve both issues at once. With help multilevel gate units, valuable context from can be optimally transmitted...

10.48550/arxiv.2007.08074 preprint EN other-oa arXiv (Cornell University) 2020-01-01

10.1109/cvpr52733.2024.00376 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information. In this paper, we explore these issues from a new perspective. We the features different modalities through densely connected structures use their mixed generate dynamic filters with receptive fields sizes. end, implement kind more flexible efficient multi-scale feature processing, i.e. dilated pyramid module. order make predictions have sharper edges consistent...

10.48550/arxiv.2007.06227 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Existing CNNs-Based RGB-D salient object detection (SOD) networks are all required to be pretrained on the ImageNet learn hierarchy features which helps provide a good initialization. However, collection and annotation of large-scale datasets time-consuming expensive. In this paper, we utilize self-supervised representation learning (SSL) design two pretext tasks: cross-modal auto-encoder depth-contour estimation. Our tasks require only few unlabeled perform pretraining, makes network...

10.48550/arxiv.2101.12482 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance semantic dispersion of targets in small receptive field and loss details large field. Existing methods rely on tedious multiple encoder-decoder streams stages gradually complete global localization local refinement. Human visual system captures regions interest by observing them...

10.48550/arxiv.2404.07445 preprint EN arXiv (Cornell University) 2024-04-10

Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) require higher visual understanding ability, camouflaged object medical lesion. Despite rapid advance of many CD tasks in respective branches, isolated evolution leads to their limited cross-domain generalisation repetitive technique innovation. Since there is a strong coupling relationship between foreground background context tasks, existing methods train separate models focused...

10.48550/arxiv.2405.01002 preprint EN arXiv (Cornell University) 2024-05-02
Coming Soon ...