Yong Zhang

ORCID: 0000-0003-0066-3448
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Adversarial Robustness in Machine Learning
  • Face recognition and analysis
  • Advanced Image Processing Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Vision and Imaging
  • Advanced Steganography and Watermarking Techniques
  • Chaos-based Image/Signal Encryption
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Computer Graphics and Visualization Techniques
  • Speech and Audio Processing
  • Advanced Neural Network Applications
  • Human Motion and Animation
  • Human Pose and Action Recognition
  • Natural Language Processing Techniques
  • Digital Media Forensic Detection
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • 3D Shape Modeling and Analysis
  • Face and Expression Recognition
  • Video Surveillance and Tracking Methods
  • Image Enhancement Techniques
  • AI in cancer detection

Huawei Technologies (Canada)
2021-2025

University of Alabama
2024

Southern Medical University
2024

Nanfang Hospital
2024

Liaoning Cancer Hospital & Institute
2022-2024

Tencent (China)
2019-2024

University of Science and Technology Liaoning
2013-2024

Central China Normal University
2021-2024

Geological Survey of Alabama
2024

Tianjin University of Commerce
2024

Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from same dataset. However, problem remains challenging one tries to generalize detector created by unseen methods This work addresses generalizable a simple principle: representation should be sensitive diverse types of forgeries. Following this principle, we propose enrich "diversity" synthesizing augmented with pool forgery configurations strengthen "sensitivity" enforcing...

10.1109/cvpr52688.2022.01815 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Image inpainting has made remarkable progress with recent advances in deep learning. Popular networks mainly follow an encoder-decoder architecture (sometimes skip connections) and possess sufficiently large receptive field, i.e., larger than the image resolution. The field refers to set of input pixels that are path-connected a neuron. For task, however, size surrounding areas needed repair different kinds missing regions different, very is not always optimal, especially for local...

10.1109/tip.2022.3152624 article EN IEEE Transactions on Image Processing 2022-01-01

Generating talking head videos through a face image and piece of speech audio still contains many challenges. i.e., unnatural movement, distorted expression, identity modification. We argue that these issues are mainly caused by learning from the coupled 2D motion fields. On other hand, explicitly using 3D information also suffers problems stiff expression incoherent video. present SadTalker, which generates coefficients (head pose, expression) 3DMM implicitly modulates novel 3D-aware render...

10.1109/cvpr52729.2023.00836 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Previous portrait image generation methods roughly fall into two categories: 2D GANs and 3D-aware GANs. can generate high fidelity portraits but with low view consistency. GAN maintain consistency their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a generator that produce view-consistent locally-editable images. Our method uses decoupled latent codes to corresponding facial semantics texture in spatial-aligned 3D volume shared geometry....

10.1109/cvpr52688.2022.00752 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that CNN-based VQ-VAE with commonly used training recipes (EMA Code Reset) allows us to obtain high-quality discrete representations. For GPT, incorporate corruption strategy during the alleviate training-testing discrepancy. Despite its...

10.1109/cvpr52729.2023.01415 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness generation progress, is still challenging to apply such for real-world visual content editing, especially videos. In this paper, we propose FateZero, a zero-shot editing method on videos without per-prompt training or use-specific mask. To edit consistently, several techniques based the pre-trained models. Firstly, contrast straightforward DDIM...

10.1109/iccv51070.2023.01460 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in radiance fields either explicitly or implicitly. Explicit methods provide expression but cannot handle topological changes caused by hair accessories, while implicit ones can model varied topologies...

10.1109/cvpr52729.2023.02011 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1109/cvpr52733.2024.00698 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Creating a vivid video from the event or scenario in our imagination is truly fascinating experience. Recent advancements text-to-video synthesis have unveiled potential to achieve this with prompts only. While text convenient conveying overall scene context, it may be insufficient control precisely. In paper, we explore customized generation by utilizing as context description and motion structure (e.g. frame- wise depth) concrete guidance. Our method, dubbed Make-Your-Video, involves...

10.1109/tvcg.2024.3365804 article EN IEEE Transactions on Visualization and Computer Graphics 2024-01-01

We present VideoReTalking, a new system to edit the faces of real-world talking head video according input audio, producing high-quality and lip-syncing output even with different emotion. Our disentangles this objective into three sequential tasks: (1) face generation canonical expression; (2) audio-driven lip-sync; (3) enhancement for improving photo-realism. Given talking-head video, we first modify expression each frame same template using editing network, resulting in expression. This...

10.1145/3550469.3555399 article EN 2022-11-29

Deepfake detection remains a challenging task due to the difficulty of generalizing new types forgeries. This problem primarily stems from overfitting existing methods forgery-irrelevant features and method-specific patterns. The latter has been rarely studied not well addressed by previous works. paper presents novel approach address two issues uncovering common forgery features. Specifically, we first propose disentanglement framework that decomposes image information into three distinct...

10.1109/iccv51070.2023.02048 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Adversarial training (AT) has been demonstrated to be effective in improving model robustness by leveraging adversarial examples for training. However, most AT methods are face of expensive time and computational cost calculating gradients at multiple steps generating examples. To boost efficiency, fast gradient sign method (FGSM) is adopted only once. Unfortunately, the far from satisfactory. One reason may arise initialization fashion. Existing generally uses a random sample-agnostic...

10.1109/tip.2022.3184255 article EN IEEE Transactions on Image Processing 2022-01-01

We present a novel paradigm for high-fidelity face swapping that faithfully preserves the desired subtle geometry and texture details. rethink from perspective of fine-grained editing, i.e., "editing swapping" (E4S), propose framework is based on explicit disentanglement shape facial components. Following E4S principle, our enables both global local features, as well controlling amount partial specified by user. Furthermore, in-herently capable handling occlusions means masks. At core system...

10.1109/cvpr52729.2023.00829 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of high accuracy optimization-based methods and efficiency learning-based methods, we propose coarse-to-fine way to realize high-fidelity reconstruction (CAR) from single image. At first stage, use an implicit model learn general shape in canonical space person way, at second refine surface detail by estimating non-rigid deformation posed optimization way. A hyper-network is utilized...

10.1109/cvpr52729.2023.00837 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

One-shot video-driven talking face generation aims at producing a synthetic video by transferring the facial motion from to an arbitrary portrait image. Head pose and expression are always entangled in transferred simultaneously. However, entanglement sets up barrier for these methods be used editing directly, where it may require modify only while maintaining unchanged. One challenge of decoupling is lack paired data, such as same but different expressions. Only few attempt tackle this with...

10.1109/cvpr52729.2023.00049 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Image inpainting aims to fill the missing hole of input. It is hard solve this task efficiently when facing high-resolution images due two reasons: (1) Large reception field needs be handled for image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously form matrix. In paper, we try break above limitations first time thanks recent development continuous implicit representation. detail, down-sample encode degraded produce spatial-adaptive...

10.1609/aaai.v37i2.25263 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power neural rendering. It is natural to associate 3D with GAN inversion methods project real image into generator's latent space, allowing free-view consistent synthesis and editing, referred as inversion. Although facial prior preserved in pre-trained GANs, reconstructing portrait only one monocular still an ill-pose problem. The straightforward application 2D focuses on texture similarity...

10.1109/cvpr52729.2023.00041 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces after a few epochs. Although various variants have been proposed prevent they require high time. In this paper, we investigate relationship between example quality overfitting by comparing processes of standard FAT. We find that occurs when success rate examples becomes worse. Based...

10.1109/tpami.2024.3381180 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-26

Abstract Motor imagery (MI) is a cognitive process wherein an individual mentally rehearses specific movement without physically executing it. Recently, MI-based brain–computer interface (BCI) has attracted widespread attention. However, accurate decoding of MI and understanding neural mechanisms still face huge challenges. These seriously hinder the clinical application development BCI systems based on MI. Thus, it very necessary to develop new methods decode tasks. In this work, we propose...

10.1093/cercor/bhad511 article EN Cerebral Cortex 2024-01-05

In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tag, such as ImageNet. However, tag annotation cannot describe all important contents of one image, and some useful information may be wasted during training. this work, we propose to train CNNs from multiple tags, enhance the quality CNN model. To end, build a large-scale multi-label image database 18M 11K categories, dubbed <italic...

10.1109/access.2019.2956775 article EN cc-by IEEE Access 2019-01-01

Video deblurring is still an unsolved problem due to the challenging spatio-temporal modeling process. While existing convolutional neural network (CNN)-based methods show a limited capacity of effective spatial and temporal for video deblurring. This paper presents VDTR, Transformer-based model that makes first attempt adapt pure Transformer VDTR exploits superior long-range relation capabilities both modeling. However, it design appropriate complicated non-uniform blurs, misalignment...

10.1109/tcsvt.2022.3201045 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-23
Coming Soon ...