- Generative Adversarial Networks and Image Synthesis
- Advanced Image and Video Retrieval Techniques
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- Visual Attention and Saliency Detection
- Multimodal Machine Learning Applications
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- 3D Shape Modeling and Analysis
- Video Analysis and Summarization
- Aesthetic Perception and Analysis
- Music and Audio Processing
- Face recognition and analysis
- Human Motion and Animation
- Advanced Image Fusion Techniques
- Video Surveillance and Tracking Methods
- Music Technology and Sound Studies
- Image Processing Techniques and Applications
- Cancer-related molecular mechanisms research
- Speech and Audio Processing
- 3D Surveying and Cultural Heritage
- Image Processing and 3D Reconstruction
Chinese Academy of Sciences
2016-2025
Institute of Automation
2016-2025
Beijing Academy of Artificial Intelligence
2020-2024
University of Chinese Academy of Sciences
2018-2024
Shandong Institute of Automation
2009-2024
University College of Applied Science
2023
Institute of Automation
2009-2021
Jilin University
2021
Shandong Institute of Business and Technology
2021
Shandong University
2021
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...
The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...
Arbitrary style transfer is a significant topic with research value and application prospect. A desired transfer, given content image referenced painting, would render the color tone vivid stroke patterns of painting while synchronously maintaining detailed structure information. Style approaches initially learn representations references then generate stylized images guided by these representations. In this paper, we propose multi-adaptation network which involves two self-adaptation (SA)...
The artistic style within a painting is the means of expression, which includes not only material, colors, and brushstrokes, but also high-level attributes, including semantic elements object shapes. Previous arbitrary example-guided image generation methods often fail to control shape changes or convey elements. Pre-trained text-to-image synthesis diffusion probabilistic models have achieved remarkable quality require extensive textual descriptions accurately portray attributes particular...
In this work, we tackle the challenging problem of arbitrary image style transfer using a novel feature representation learning method. A suitable representation, as key component in stylization tasks, is essential to achieve satisfactory results. Existing deep neural network based approaches reasonable results with guidance from second-order statistics such Gram matrix content features. However, they do not leverage sufficient information, which artifacts local distortions and...
Video style transfer is attracting increasing attention from the artificial intelligence community because of its numerous applications, such as augmented reality and animation production. Relative to traditional image transfer, video presents new challenges, including how effectively generate satisfactory stylized results for any specified while maintaining temporal coherence across frames. Towards this end, we propose a Multi-Channel Correlation network (MCCNet), which can be trained fuse...
Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...
We present a novel method for content-aware image resizing based on optimization of well-defined distance function, which preserves both the important regions and global visual effect (the background or other decorative objects) an image. The operates by joint use seam carving scaling. principle behind our is bidirectional similarity function Euclidean (IMED), while cooperating with dominant color descriptor (DCD) energy variation. suitable quantitative evaluation result determination best...
This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, new structure, "Segment-Tree", is proposed non-local matching aggregation. Conceptually, segment-tree constructed in three-step process: first, pixels are grouped into set segments with reference color or intensity image, second, graph created each segment, final step, these independent segment graphs linked to form structure. In...
Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...
Abstract Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than RNN sequential structure. Thus, such models can be trained in parallel represent global information. This study comprehensively surveys recent transformer works....
Weakly supervised object localization (WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for...
Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing editing specific visual attributes such as material, style, layout remains challenge, leading lack of disentanglement editability. To address this problem, we propose novel approach that leverages...
Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous using task-specific training data. Despite the performance improvements on tasks, several studies reported that CoOp suffers from overfitting issue two aspects: (i) test accuracy base classes first improves and then...
Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural into stylized one according to textual descriptions target provided by user. Unlike previous image-to-image approaches, text-guided progress provides users with more precise and intuitive way express desired style. However, huge discrepancy between cross-modal inputs/outputs makes it challenging conduct in typical feed-forward...
In this paper, we address the following research problem: How can generate a meaningful split grammar that explains given facade layout? To evaluate if is meaningful, propose cost function based on description length and minimize using an approximate dynamic programming framework. Our evaluation indicates our framework extracts grammars are competitive with those of expert users, while some users all competing automatic solutions less successful.
Abstract This paper presents a novel content‐based method for transferring the colour patterns between images. Unlike previous methods that rely on image statistics, our puts an emphasis high‐level scene content analysis. We first automatically extract foreground subject areas and background layout from scene. The semantic correspondences of regions source target images are established. In second step, is re‐coloured in optimization framework, which incorporates extracted information spatial...
In this paper, we address the problem of natural flower classification. It is a challenging task due to non-rigid deformation, illumination changes, and inter-class similarity. We build large dataset images in wide with 79 categories propose novel framework based on convolutional neural network (CNN) solve problem. Unlike other methods using hand-crafted visual features, our method utilizes automatically learn good features for The consists five layers where small receptive fields are...
Arbitrary image stylization by neural networks has become a popular topic, and video is attracting more attention as an extension of stylization. However, when methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted detailed comprehensive analysis the cause such effects. Systematic comparisons among typical style transfer approaches show feature migration modules for state-of-the-art (SOTA) learning systems...
Facial action unit (AU) intensity estimation plays an important role in affective computing and human-computer interaction. Recent works have introduced deep neural networks for AU estimation, but they require a large amount of annotations. annotation needs strong domain expertise it is expensive to construct database learn models. We propose novel knowledge-based semi-supervised convolutional network with extremely limited Only the annotations peak valley frames training sequences are...
Facial action units (AUs) play an important role in human emotion understanding. One big challenge for data-driven AU recognition approaches is the lack of enough annotations, since annotation requires strong domain expertise. To alleviate this issue, we propose a knowledge-driven method jointly learning multiple classifiers without any by leveraging prior probabilities on AUs, including expression-independent and expression-dependent probabilities. These are drawn from facial anatomy...
Automatic intensity estimation of facial action units (AUs) is challenging in two aspects. First, capturing subtle changes appearance quite difficult. Second, the annotation AU scarce and expensive. Intensity requires strong domain knowledge thus only experts are qualified. The majority methods directly apply supervised learning techniques to while few exploit unlabeled samples improve performance. In this paper, we propose a novel weakly regression model-Bilateral Ordinal Relevance...
Personalized image aesthetic assessment (PIAA) has recently become a hot topic due to its wide applications, such as photography, film, television, e-commerce, fashion design, and so on. This task is more seriously affected by subjective factors samples provided users. In order acquire precise personalized distribution small amount of samples, we propose novel user-guided framework. framework leverages user interactions retouch rank images for based on deep reinforcement learning (DRL),...
This work presents Unified Contrastive Arbitrary Style Transfer (UCAST), a novel style representation learning and transfer framework, that can fit in most existing arbitrary image models, such as CNN-based, ViT-based, flow-based methods. As the key component tasks, suitable is essential to achieve satisfactory results. Existing approaches based on deep neural networks typically use second-order statistics generate output. However, these hand-crafted features computed from single cannot...