Weiming Dong

ORCID: 0000-0001-6502-145X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image and Video Retrieval Techniques
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • Visual Attention and Saliency Detection
  • Multimodal Machine Learning Applications
  • Image Retrieval and Classification Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Advanced Vision and Imaging
  • Advanced Image Processing Techniques
  • 3D Shape Modeling and Analysis
  • Video Analysis and Summarization
  • Aesthetic Perception and Analysis
  • Music and Audio Processing
  • Face recognition and analysis
  • Human Motion and Animation
  • Advanced Image Fusion Techniques
  • Video Surveillance and Tracking Methods
  • Music Technology and Sound Studies
  • Image Processing Techniques and Applications
  • Cancer-related molecular mechanisms research
  • Speech and Audio Processing
  • 3D Surveying and Cultural Heritage
  • Image Processing and 3D Reconstruction

Chinese Academy of Sciences
2016-2025

Institute of Automation
2016-2025

Beijing Academy of Artificial Intelligence
2020-2024

University of Chinese Academy of Sciences
2018-2024

Shandong Institute of Automation
2009-2024

University College of Applied Science
2023

Institute of Automation
2009-2021

Jilin University
2021

Shandong Institute of Business and Technology
2021

Shandong University
2021

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.1109/cvpr42600.2020.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...

10.1109/cvpr52688.2022.01104 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Arbitrary style transfer is a significant topic with research value and application prospect. A desired transfer, given content image referenced painting, would render the color tone vivid stroke patterns of painting while synchronously maintaining detailed structure information. Style approaches initially learn representations references then generate stylized images guided by these representations. In this paper, we propose multi-adaptation network which involves two self-adaptation (SA)...

10.1145/3394171.3414015 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

The artistic style within a painting is the means of expression, which includes not only material, colors, and brushstrokes, but also high-level attributes, including semantic elements object shapes. Previous arbitrary example-guided image generation methods often fail to control shape changes or convey elements. Pre-trained text-to-image synthesis diffusion probabilistic models have achieved remarkable quality require extensive textual descriptions accurately portray attributes particular...

10.1109/cvpr52729.2023.00978 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

In this work, we tackle the challenging problem of arbitrary image style transfer using a novel feature representation learning method. A suitable representation, as key component in stylization tasks, is essential to achieve satisfactory results. Existing deep neural network based approaches reasonable results with guidance from second-order statistics such Gram matrix content features. However, they do not leverage sufficient information, which artifacts local distortions and...

10.1145/3528233.3530736 preprint EN 2022-07-20

Video style transfer is attracting increasing attention from the artificial intelligence community because of its numerous applications, such as augmented reality and animation production. Relative to traditional image transfer, video presents new challenges, including how effectively generate satisfactory stylized results for any specified while maintaining temporal coherence across frames. Towards this end, we propose a Multi-Channel Correlation network (MCCNet), which can be trained fuse...

10.1609/aaai.v35i2.16208 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...

10.1609/aaai.v36i3.20202 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

We present a novel method for content-aware image resizing based on optimization of well-defined distance function, which preserves both the important regions and global visual effect (the background or other decorative objects) an image. The operates by joint use seam carving scaling. principle behind our is bidirectional similarity function Euclidean (IMED), while cooperating with dominant color descriptor (DCD) energy variation. suitable quantitative evaluation result determination best...

10.1145/1618452.1618471 article EN ACM Transactions on Graphics 2009-12-01

This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, new structure, "Segment-Tree", is proposed non-local matching aggregation. Conceptually, segment-tree constructed in three-step process: first, pixels are grouped into set segments with reference color or intensity image, second, graph created each segment, final step, these independent segment graphs linked to form structure. In...

10.1109/cvpr.2013.47 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...

10.1145/3240508.3240554 article EN Proceedings of the 30th ACM International Conference on Multimedia 2018-10-15

Abstract Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than RNN sequential structure. Thus, such models can be trained in parallel represent global information. This study comprehensively surveys recent transformer works....

10.1007/s41095-021-0247-3 article EN cc-by Computational Visual Media 2021-10-27

Weakly supervised object localization (WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for...

10.1109/cvpr46437.2021.01147 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing editing specific visual attributes such as material, style, layout remains challenge, leading lack of disentanglement editability. To address this problem, we propose novel approach that leverages...

10.1145/3618342 article EN cc-by ACM Transactions on Graphics 2023-12-05

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous using task-specific training data. Despite the performance improvements on tasks, several studies reported that CoOp suffers from overfitting issue two aspects: (i) test accuracy base classes first improves and then...

10.1109/tcsvt.2023.3245584 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-02-16

Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural into stylized one according to textual descriptions target provided by user. Unlike previous image-to-image approaches, text-guided progress provides users with more precise and intuitive way express desired style. However, huge discrepancy between cross-modal inputs/outputs makes it challenging conduct in typical feed-forward...

10.1109/tnnls.2023.3342645 article EN IEEE Transactions on Neural Networks and Learning Systems 2024-01-10

In this paper, we address the following research problem: How can generate a meaningful split grammar that explains given facade layout? To evaluate if is meaningful, propose cost function based on description length and minimize using an approximate dynamic programming framework. Our evaluation indicates our framework extracts grammars are competitive with those of expert users, while some users all competing automatic solutions less successful.

10.1145/2601097.2601162 article EN ACM Transactions on Graphics 2014-07-22

Abstract This paper presents a novel content‐based method for transferring the colour patterns between images. Unlike previous methods that rely on image statistics, our puts an emphasis high‐level scene content analysis. We first automatically extract foreground subject areas and background layout from scene. The semantic correspondences of regions source target images are established. In second step, is re‐coloured in optimization framework, which incorporates extracted information spatial...

10.1111/cgf.12008 article EN Computer Graphics Forum 2013-01-11

In this paper, we address the problem of natural flower classification. It is a challenging task due to non-rigid deformation, illumination changes, and inter-class similarity. We build large dataset images in wide with 79 categories propose novel framework based on convolutional neural network (CNN) solve problem. Unlike other methods using hand-crafted visual features, our method utilizes automatically learn good features for The consists five layers where small receptive fields are...

10.1109/fspma.2016.7818296 article EN 2016-11-01

Arbitrary image stylization by neural networks has become a popular topic, and video is attracting more attention as an extension of stylization. However, when methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted detailed comprehensive analysis the cause such effects. Systematic comparisons among typical style transfer approaches show feature migration modules for state-of-the-art (SOTA) learning systems...

10.1109/tnnls.2022.3230084 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-01-06

Facial action unit (AU) intensity estimation plays an important role in affective computing and human-computer interaction. Recent works have introduced deep neural networks for AU estimation, but they require a large amount of annotations. annotation needs strong domain expertise it is expensive to construct database learn models. We propose novel knowledge-based semi-supervised convolutional network with extremely limited Only the annotations peak valley frames training sequences are...

10.1109/cvpr.2018.00246 article EN 2018-06-01

Facial action units (AUs) play an important role in human emotion understanding. One big challenge for data-driven AU recognition approaches is the lack of enough annotations, since annotation requires strong domain expertise. To alleviate this issue, we propose a knowledge-driven method jointly learning multiple classifiers without any by leveraging prior probabilities on AUs, including expression-independent and expression-dependent probabilities. These are drawn from facial anatomy...

10.1109/cvpr.2018.00536 article EN 2018-06-01

Automatic intensity estimation of facial action units (AUs) is challenging in two aspects. First, capturing subtle changes appearance quite difficult. Second, the annotation AU scarce and expensive. Intensity requires strong domain knowledge thus only experts are qualified. The majority methods directly apply supervised learning techniques to while few exploit unlabeled samples improve performance. In this paper, we propose a novel weakly regression model-Bilateral Ordinal Relevance...

10.1109/cvpr.2018.00735 article EN 2018-06-01

Personalized image aesthetic assessment (PIAA) has recently become a hot topic due to its wide applications, such as photography, film, television, e-commerce, fashion design, and so on. This task is more seriously affected by subjective factors samples provided users. In order acquire precise personalized distribution small amount of samples, we propose novel user-guided framework. framework leverages user interactions retouch rank images for based on deep reinforcement learning (DRL),...

10.1109/tmm.2021.3130752 article EN IEEE Transactions on Multimedia 2021-11-25

This work presents Unified Contrastive Arbitrary Style Transfer (UCAST), a novel style representation learning and transfer framework, that can fit in most existing arbitrary image models, such as CNN-based, ViT-based, flow-based methods. As the key component tasks, suitable is essential to achieve satisfactory results. Existing approaches based on deep neural networks typically use second-order statistics generate output. However, these hand-crafted features computed from single cannot...

10.1145/3605548 article EN ACM Transactions on Graphics 2023-06-20

10.1109/cvpr52733.2024.00662 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16
Coming Soon ...