Wenguan Wang

ORCID: 0000-0002-0802-9567
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Visual Attention and Saliency Detection
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods
  • Image and Video Quality Assessment
  • Speech and dialogue systems
  • Advanced Vision and Imaging
  • Natural Language Processing Techniques
  • 3D Shape Modeling and Analysis
  • Video Analysis and Summarization
  • Anomaly Detection Techniques and Applications
  • Olfactory and Sensory Function Studies
  • Topic Modeling
  • 3D Surveying and Cultural Heritage
  • Remote Sensing and LiDAR Applications
  • Gaze Tracking and Assistive Technology
  • Advanced Image Processing Techniques
  • Hand Gesture Recognition Systems
  • Generative Adversarial Networks and Image Synthesis
  • Image Retrieval and Classification Techniques
  • Robotic Path Planning Algorithms
  • Medical Image Segmentation Techniques

Zhejiang University of Science and Technology
2024

Zhejiang Lab
2023-2024

Zhejiang University
2023-2024

University of Technology Sydney
2022-2023

Dalian University of Technology
2023

ETH Zurich
2019-2022

University of California, Los Angeles
2018-2021

Board of the Swiss Federal Institutes of Technology
2021

Beijing Institute of Technology
2014-2020

Inception Institute of Artificial Intelligence
2019-2020

As an essential problem in computer vision, salient object detection (SOD) has attracted increasing amount of research attention over the years. Recent advances SOD are predominantly led by deep learning-based solutions (named SOD). To enable in-depth understanding SOD, this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, first review algorithms different perspectives, including network architecture, level...

10.1109/tpami.2021.3051099 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-13

This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) video saliency training with the absence of sufficiently large and pixel-wise annotated data 2) fast detection. The proposed network consists modules, for capturing spatial temporal information, respectively. dynamic model, explicitly incorporating estimates from static directly produces spatiotemporal inference without time-consuming optical flow computation. We...

10.1109/tip.2017.2754941 article EN IEEE Transactions on Image Processing 2017-09-20

This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. first is the exploitation of an essential pyramid attention structure object detection. enables network to concentrate more on regions while considering multi-scale saliency information. Such stacked design provides powerful tool efficiently improve representation ability corresponding layer with enlarged...

10.1109/cvpr.2019.00154 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We introduce an unsupervised, geodesic distance based, salient video object segmentation method. Unlike traditional methods, our method incorporates saliency as prior for via the computation of robust measurement. consider two discriminative visual features: spatial edges and temporal motion boundaries indicators foreground locations. first generate framewise spatiotemporal maps using from these indicators. Building on observation that areas are surrounded by regions with high edge values,...

10.1109/cvpr.2015.7298961 article EN 2015-06-01

We introduce a novel network, called as CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from holistic view. emphasize importance of inherent correlation among frames and incorporate global co-attention mechanism improve further state-of-the-art deep learning based solutions that primarily focus on discriminative foreground representations over appearance motion in short-term temporal segments. The layers our network provide efficient...

10.1109/cvpr.2019.00374 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected visual-attention-consistent Densely Annotated (DAVSOD) dataset, which contains 226 videos 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding human...

10.1109/cvpr.2019.00875 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Video saliency, aiming for estimation of a single dominant object in sequence, offers strong object-level cues unsupervised video segmentation. In this paper, we present geodesic distance based technique that provides reliable and temporally consistent saliency measurement superpixels as prior pixel-wise labeling. Using undirected intra-frame inter-frame graphs constructed from spatiotemporal edges or appearance motion, skeleton abstraction step to further enhance estimates, our method...

10.1109/tpami.2017.2662005 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2017-01-31

In this paper, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although convolutional neural networks (CNNs) have made substantial improvement attention prediction, it is still needed improve the CNN-based models by efficiently leveraging multi-scale features. Our visual network proposed capture hierarchical saliency information from deep, coarse layers global shallow, fine local response. model a skip-layer structure, which...

10.1109/tip.2017.2787612 article EN IEEE Transactions on Image Processing 2017-12-27

Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria IoU-like loss). However, they ignore "global" context of the training data, rich relations across different images. Inspired recent advance in unsupervised contrastive representation learning, we propose a pixel-wise algorithm for fully...

10.1109/iccv48922.2021.00721 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

We present a novel spatiotemporal saliency detection method to estimate salient regions in videos based on the gradient flow field and energy optimization. The proposed incorporates two distinctive features: 1) intra-frame boundary information 2) inter-frame motion together for indicating regions. Based effective utilization of both field, our algorithm is robust enough object background complex scenes with various patterns appearances. Then, we introduce local as well global contrast...

10.1109/tip.2015.2460013 article EN IEEE Transactions on Image Processing 2015-07-22

In this paper, we propose a pose grammar to tackle the problem of 3D human estimation. Our model directly takes 2D as input and learns generalized 2D-3D mapping function. The proposed consists base network which efficiently captures pose-aligned features hierarchy Bi-directional RNNs (BRNN) on top explicitly incorporate set knowledge regarding body configuration (i.e., kinematics, symmetry, motor coordination). thus enforces high-level constraints over poses. learning, develop sample...

10.1609/aaai.v32i1.12270 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-27

We study the problem of photo cropping, which aims to find a cropping window an input image preserve as much possible its important parts while being aesthetically pleasant. Seeking deep learning-based solution, we design neural network that has two branches for attention box prediction (ABP) and aesthetics assessment (AA), respectively. Given image, ABP predicts bounding initial minimum window, around set candidates are generated with little loss information. Then, AA is employed select...

10.1109/tpami.2018.2840724 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-05-25

We present a novel image superpixel segmentation approach using the proposed lazy random walk (LRW) algorithm in this paper. Our method begins with initializing seed positions and runs LRW on input to obtain probabilities of each pixel. Then, boundaries initial superpixels are obtained according commute time. The iteratively optimized by new energy function, which is defined time texture measurement. self-loops has merits segmenting weak complicated regions very well global probability maps...

10.1109/tip.2014.2302892 article EN IEEE Transactions on Image Processing 2014-01-31

In this paper, we propose a real-time image superpixel segmentation method with 50 frames/s by using the density-based spatial clustering of applications noise (DBSCAN) algorithm. order to decrease computational costs algorithms, adopt fast two-step framework. first stage, DBSCAN algorithm color-similarity and geometric restrictions is used rapidly cluster pixels, then, small clusters are merged into superpixels their neighborhood through distance measurement defined color features in second...

10.1109/tip.2016.2616302 article EN IEEE Transactions on Image Processing 2016-10-11

This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as process of iterative information fusion over graphs. Specifically, builds fully connected to efficiently represent frames nodes, and relations between arbitrary frame pairs edges. underlying pair-wise are described by differentiable attention mechanism. Through parametric message passing, is able capture mine much richer higher-order frames,...

10.1109/iccv.2019.00933 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Predicting where people look in static scenes, a.k.a visual saliency, has received significant research interest recently. However, relatively less effort been spent understanding and modeling attention over dynamic scenes. This work makes three contributions to video saliency research. First, we introduce a new benchmark, called DHF1K (Dynamic Human Fixation 1K), for predicting fixations during scene free-viewing, which is long-time need this field. consists of 1K high-quality...

10.1109/tpami.2019.2924417 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-06-25

This paper proposes a knowledge-guided fashion network to solve the problem of visual analysis, e.g., landmark localization and clothing category classification. The suggested model is leveraged with high-level human knowledge in this domain. We propose two important grammars: (i) dependency grammar capturing kinematics-like relation, (ii) symmetry accounting for bilateral clothes. introduce Bidirectional Convolutional Recurrent Neural Networks (BCRNNs) efficiently approaching message...

10.1109/cvpr.2018.00449 article EN 2018-06-01

This paper presents a salient object detection method that integrates both top-down and bottom-up saliency inference in an iterative cooperative manner. The process is used for coarse-to-fine estimation, where high-level gradually integrated with finer lower-layer features to obtain fine-grained result. infers the high-level, but rough through using upper-layer, semantically-richer features. These two processes are alternatively performed, uses obtained from yield enhanced estimate, process,...

10.1109/cvpr.2019.00612 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data UVOS setting, for first time, we quantitatively verified high consistency behavior among human observers, found strong correlation between explicit primary object judgements during dynamic, task-driven viewing. Such novel...

10.1109/cvpr.2019.00318 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax pixel-query based), can be placed in one category, by considering the weights query vectors as learnable class prototypes. In light of this prototype view, study uncovers several limitations such parametric regime, proposes a nonparametric alternative on non-learnable Instead prior methods learning single weight/query vector for...

10.1109/cvpr52688.2022.00261 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). The proposed is based on triple-branch encoder-decoder architecture. first two branches are learned for sharpening FG BG details, respectively; while third one produces global, harmonious results by comprehensively fusing multi-scale information from domains. further endowed with supervised, attention mechanism in an end-to-end fashion. It learns soft mask...

10.1109/iccv.2019.00567 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01
Coming Soon ...