Wenqing Chu

ORCID: 0000-0003-0816-7975
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Face recognition and analysis
  • Advanced Image Processing Techniques
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Multimodal Machine Learning Applications
  • Image Enhancement Techniques
  • Computer Graphics and Visualization Techniques
  • Speech and Audio Processing
  • Human Motion and Animation
  • Image Retrieval and Classification Techniques
  • Image Processing Techniques and Applications
  • 3D Surveying and Cultural Heritage
  • 3D Shape Modeling and Analysis
  • Biological Activity of Diterpenoids and Biflavonoids
  • Advanced Manufacturing and Logistics Optimization
  • Digital Media Forensic Detection
  • Gait Recognition and Analysis
  • Visual Attention and Saliency Detection
  • Stress Responses and Cortisol
  • Blind Source Separation Techniques

Beijing Institute of Technology
2024

Baidu (China)
2024

Tencent (China)
2020-2023

Zhejiang University
2016-2021

Seoul National University
2021

Inner Mongolia University for Nationalities
2020

Alibaba Group (China)
2017

Huazhong University of Science and Technology
2013

Prevailing video frame interpolation algorithms, that generate the intermediate frames from consecutive inputs, typically rely on complex model architectures with heavy parameters or large delay, hindering them diverse real-time applications. In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast in-termediate synthesizing. It first extracts pyramid features given and then refines bilateral flow fields together a powerful intermedi-ate feature until...

10.1109/cvpr52688.2022.00201 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Abnormal event detection in large videos is an important task research and industrial applications, which has attracted considerable attention recent years. Existing methods usually solve this problem by extracting local features then learning outlier model on training videos. However, most previous approaches merely employ hand-crafted visual features, a clear disadvantage due to their limited representation capacity. In paper, we present novel unsupervised deep feature algorithm for the...

10.1109/tmm.2018.2846411 article EN IEEE Transactions on Multimedia 2018-06-11

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the shape of source and generate photo-realistic results. Unlike other existing works that only use recognition model to keep identity similarity, 3D shape-aware control with geometric supervision from 3DMM reconstruction method. Meanwhile, introduce Semantic Facial Fusion module optimize combination encoder decoder features make adaptive blending, makes results more photo-realistic....

10.24963/ijcai.2021/157 article EN 2021-08-01

Motion blur is a common photography artifact in dynamic environments that typically comes jointly with the other types of degradation. This paper reviews NTIRE 2021 Challenge on Image Deblurring. In this challenge report, we describe specifics and evaluation results from 2 competition tracks proposed solutions. While both aim to recover high-quality clean image blurry image, different artifacts are involved. track 1, images low resolution while compressed JPEG format. each competition, there...

10.1109/cvprw53098.2021.00025 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

Vehicle detection is a challenging problem in autonomous driving systems, due to its large structural and appearance variations. In this paper, we propose novel vehicle scheme based on multi-task deep convolutional neural networks (CNNs) region-of-interest (RoI) voting. the design of CNN architecture, enrich supervised information with subcategory, region overlap, bounding-box regression, category each training RoI as learning framework. This allows model share visual knowledge among...

10.1109/tip.2017.2762591 article EN IEEE Transactions on Image Processing 2017-10-12

Model generalization to the unseen scenes is crucial real-world applications, such as autonomous driving, which requires robust vision systems. To enhance model generalization, domain through learning domain-invariant representation has been widely studied. However, most existing works learn shared feature space within multi-source domains but ignore characteristic of itself (e.g., sensitivity domain-specific style). Therefore, we propose Domain-invariant Representation Learning (DIRL) for...

10.1609/aaai.v36i3.20193 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Blind face restoration, which aims to reconstruct high-quality images from low-quality inputs, can benefit many applications. Although existing generative-based methods achieve significant progress in producing images, they often fail restore natural shapes and high-fidelity facial details severely-degraded inputs. In this work, we propose integrate shape generative priors guide the challenging blind restoration. Firstly, set up a restoration module recover reason-able geometry with 3D...

10.1109/cvpr52688.2022.00751 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Recently, emotional talking face generation has received considerable attention. However, existing methods only adopt one-hot coding, image, or audio as emotion conditions, thus lacking flexible control in practical applications and failing to handle unseen styles due limited semantics. They either ignore the one-shot setting quality of generated faces. In this paper, we propose a more generalized framework. Specifically, supplement style text prompts use an Aligned Multi-modal Emotion...

10.1109/cvpr52729.2023.00639 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Recent advances in image inpainting have shown impressive results for generating plausible visual details on rather simple backgrounds. However, complex scenes, it is still challenging to restore reasonable contents as the contextual information within missing regions tends be ambiguous. To tackle this problem, we introduce pretext tasks that are semantically meaningful estimating contents. In particular, perform knowledge distillation models and adapt features inpainting. The learned...

10.24963/ijcai.2021/183 article EN 2021-08-01

Super-Resolution (SR) is a fundamental computer vision task that aims to obtain high-resolution clean image from the given low-resolution counterpart. This paper reviews NTIRE 2021 Challenge on Video Super-Resolution. We present evaluation results two competition tracks as well proposed solutions. Track 1 develop conventional video SR methods focusing restoration quality. 2 assumes more challenging environment with lower frame rates, casting spatio-temporal problem. In each competition, 247...

10.1109/cvprw53098.2021.00026 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

In this paper, we present VideoGen, a text-to-video generation approach, which can generate high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion. We leverage an off-the-shelf text-to-image model, e.g., Stable Diffusion, to image content quality from the text prompt, as reference guide generation. Then, introduce efficient cascaded diffusion module conditioned on both for generating representations, followed by flow-based...

10.48550/arxiv.2309.00398 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We propose a new attention model for video question answering. The main idea of the models is to locate on most informative parts visual data. mechanisms are quite popular these days. However, existing regard as whole. They ignore word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider semantic structure sentences. Although Extended Soft Attention (E-SA) answering leverages attention, it performs poorly long In this paper,...

10.1109/tip.2018.2859820 article EN IEEE Transactions on Image Processing 2018-07-25

As one of the most popular unsupervised learning approaches, autoencoder aims at transforming inputs to outputs with least discrepancy. The conventional and its variants only consider one-to-one reconstruction, which ignores intrinsic structure data may lead overfitting. In order preserve latent geometric information in data, we propose stacked similarity-aware autoencoders. To train each single autoencoder, first obtain pseudo class label sample by clustering input features. Then hidden...

10.24963/ijcai.2017/216 article EN 2017-07-28

Most semantic segmentation models treat as a pixel-wise classification task and use error their optimization criterions. However, the ignores strong dependencies among pixels in an image, which limits performance of model. Several ways to incorporate structure information objects have been investigated, \eg, conditional random fields (CRF), image priors based methods, generative adversarial network (GAN). Nevertheless, these methods usually require extra model branches or additional...

10.48550/arxiv.1910.08711 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

A caricature is an artistic form of a person's picture in which certain striking characteristics are abstracted or exaggerated order to create humor sarcasm effect. For numerous related applications such as attribute recognition and editing, face parsing essential pre-processing step that provides complete facial structure understanding. However, current state-of-the-art methods require large amounts labeled data on the pixel-level process for tedious labor-intensive. real photos, there...

10.1109/icip.2019.8803517 preprint EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Abstract Caricature is an artistic drawing created to abstract or exaggerate facial features of a person. Rendering visually pleasing caricatures difficult task that requires professional skills, and thus it great interest design method automatically generate such drawings. To deal with large shape changes, we propose algorithm based on semantic transform produce diverse plausible exaggerations. Specifically, predict pixel-wise correspondences perform image warping the input photo achieve...

10.1007/s11263-021-01489-1 article EN cc-by International Journal of Computer Vision 2021-07-09

Fine-grained object retrieval, which aims at finding objects belonging to the same sub-category as probe from a large database, is becoming increasingly popular because of its research and application significance. Recently, convolutional neural network (CNN) based deep learning models have achieved promising retrieval performance, they can learn both feature representations discriminative distance metrics jointly. Specifically, generic method extract activations fully-connected layer...

10.1145/3126686.3126708 article EN 2017-10-23

In this article, we propose a novel deep Siamese architecture based on convolutional neural network (CNN) and multi-level similarity perception for the person re-identification (re-ID) problem. According to distinct characteristics of diverse feature maps, effectively apply different constraints both low-level high-level maps during training stage. Due introduction appropriate comparison mechanisms at levels, proposed approach can adaptively learn discriminative local global representations,...

10.1145/3309881 article EN ACM Transactions on Multimedia Computing Communications and Applications 2019-05-31

While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with speaker's speaking status remains challenging. Existing efforts either focus on learning a dynamic head pose synchronized speech rhythm or aim stylized movements guided by external reference such as emotional labels video clips. The former works often yield coarse alignment, neglecting...

10.1109/access.2024.3390182 article EN cc-by-nc-nd IEEE Access 2024-01-01
Coming Soon ...