NFDI4DS | UHH-SEMS - Publication Details

Yong Zhang

ORCID: 0000-0003-0066-3448

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100419834

Research Areas

Generative Adversarial Networks and Image Synthesis
Adversarial Robustness in Machine Learning
Face recognition and analysis
Advanced Image Processing Techniques
Anomaly Detection Techniques and Applications
Advanced Vision and Imaging
Advanced Steganography and Watermarking Techniques
Chaos-based Image/Signal Encryption
Video Analysis and Summarization
Multimodal Machine Learning Applications
Topic Modeling
Computer Graphics and Visualization Techniques
Speech and Audio Processing
Advanced Neural Network Applications
Human Motion and Animation
Human Pose and Action Recognition
Natural Language Processing Techniques
Digital Media Forensic Detection
Advanced Image and Video Retrieval Techniques
Domain Adaptation and Few-Shot Learning
3D Shape Modeling and Analysis
Face and Expression Recognition
Video Surveillance and Tracking Methods
Image Enhancement Techniques
AI in cancer detection

Huawei Technologies (Canada)
2021-2025

University of Alabama
2024

Southern Medical University
2024

Nanfang Hospital
2024

Liaoning Cancer Hospital & Institute
2022-2024

Tencent (China)
2019-2024

University of Science and Technology Liaoning
2013-2024

Central China Normal University
2021-2024

Geological Survey of Alabama
2024

Tianjin University of Commerce
2024

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

OPENALEX - Publications

Liang Chen Yong Zhang Yibing Song Lingqiao Liu Jue Wang

Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from same dataset. However, problem remains challenging one tries to generalize detector created by unseen methods This work addresses generalizable a simple principle: representation should be sensitive diverse types of forgeries. Following this principle, we propose enrich "diversity" synthesizing augmented with pool forgery configurations strengthen "sensitivity" enforcing...

10.1109/cvpr52688.2022.01815 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Image Inpainting With Local and Global Refinement

OPENALEX - Publications

Weize Quan Ruisong Zhang Yong Zhang Zhifeng Li Jue Wang and 1 more

Image inpainting has made remarkable progress with recent advances in deep learning. Popular networks mainly follow an encoder-decoder architecture (sometimes skip connections) and possess sufficiently large receptive field, i.e., larger than the image resolution. The field refers to set of input pixels that are path-connected a neuron. For task, however, size surrounding areas needed repair different kinds missing regions different, very is not always optimal, especially for local...

10.1109/tip.2022.3152624 article EN IEEE Transactions on Image Processing 2022-01-01

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

OPENALEX - Publications

Wenxuan Zhang Xiaodong Cun Xuan Wang Yong Zhang Xi Shen and 3 more

Generating talking head videos through a face image and piece of speech audio still contains many challenges. i.e., unnatural movement, distorted expression, identity modification. We argue that these issues are mainly caused by learning from the coupled 2D motion fields. On other hand, explicitly using 3D information also suffers problems stiff expression incoherent video. present SadTalker, which generates coefficients (head pose, expression) 3DMM implicitly modulates novel 3D-aware render...

10.1109/cvpr52729.2023.00836 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

FENeRF: Face Editing in Neural Radiance Fields

OPENALEX - Publications

Jingxiang Sun Xuan Wang Yong Zhang Xiaoyu Li Qi Zhang and 2 more

Previous portrait image generation methods roughly fall into two categories: 2D GANs and 3D-aware GANs. can generate high fidelity portraits but with low view consistency. GAN maintain consistency their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a generator that produce view-consistent locally-editable images. Our method uses decoupled latent codes to corresponding facial semantics texture in spatial-aligned 3D volume shared geometry....

10.1109/cvpr52688.2022.00752 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Generating Human Motion from Textual Descriptions with Discrete Representations

OPENALEX - Publications

Jianrong Zhang Yangsong Zhang Xiaodong Cun Yong Zhang Hongwei Zhao and 3 more

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that CNN-based VQ-VAE with commonly used training recipes (EMA Code Reset) allows us to obtain high-quality discrete representations. For GPT, incorporate corruption strategy during the alleviate training-testing discrepancy. Despite its...

10.1109/cvpr52729.2023.01415 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

OPENALEX - Publications

Chenyang Qi Xiaodong Cun Yong Zhang Chenyang Lei Xintao Wang and 2 more

The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness generation progress, is still challenging to apply such for real-world visual content editing, especially videos. In this paper, we propose FateZero, a zero-shot editing method on videos without per-prompt training or use-specific mask. To edit consistently, several techniques based the pre-trained models. Firstly, contrast straightforward DDIM...

10.1109/iccv51070.2023.01460 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

OPENALEX - Publications

Jingxiang Sun Xuan Wang Lizhen Wang Xiaoyu Li Yong Zhang and 2 more

3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in radiance fields either explicitly or implicitly. Explicit methods provide expression but cannot handle topological changes caused by hair accessories, while implicit ones can model varied topologies...

10.1109/cvpr52729.2023.02011 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

OPENALEX - Publications

Haoxin Chen Yong Zhang Xiaodong Cun Menghan Xia Xintao Wang and 2 more

10.1109/cvpr52733.2024.00698 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

OPENALEX - Publications

Jinbo Xing Menghan Xia Yuxin Liu Yuechen Zhang Yong Zhang and 7 more

Creating a vivid video from the event or scenario in our imagination is truly fascinating experience. Recent advancements text-to-video synthesis have unveiled potential to achieve this with prompts only. While text convenient conveying overall scene context, it may be insufficient control precisely. In paper, we explore customized generation by utilizing as context description and motion structure (e.g. frame- wise depth) concrete guidance. Our method, dubbed Make-Your-Video, involves...

10.1109/tvcg.2024.3365804 article EN IEEE Transactions on Visualization and Computer Graphics 2024-01-01

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

OPENALEX - Publications

Kun Cheng Xiaodong Cun Yong Zhang Menghan Xia Fei Yin and 4 more

We present VideoReTalking, a new system to edit the faces of real-world talking head video according input audio, producing high-quality and lip-syncing output even with different emotion. Our disentangles this objective into three sequential tasks: (1) face generation canonical expression; (2) audio-driven lip-sync; (3) enhancement for improving photo-realism. Given talking-head video, we first modify expression each frame same template using editing network, resulting in expression. This...

10.1145/3550469.3555399 article EN 2022-11-29

UCF: Uncovering Common Features for Generalizable Deepfake Detection

OPENALEX - Publications

Zhiyuan Yan Yong Zhang Yanbo Fan Baoyuan Wu

Deepfake detection remains a challenging task due to the difficulty of generalizing new types forgeries. This problem primarily stems from overfitting existing methods forgery-irrelevant features and method-specific patterns. The latter has been rarely studied not well addressed by previous works. paper presents novel approach address two issues uncovering common forgery features. Specifically, we first propose disentanglement framework that decomposes image information into three distinct...

10.1109/iccv51070.2023.02048 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Boosting Fast Adversarial Training With Learnable Adversarial Initialization

OPENALEX - Publications

Xiaojun Jia Yong Zhang Baoyuan Wu Jue Wang Xiaochun Cao

Adversarial training (AT) has been demonstrated to be effective in improving model robustness by leveraging adversarial examples for training. However, most AT methods are face of expensive time and computational cost calculating gradients at multiple steps generating examples. To boost efficiency, fast gradient sign method (FGSM) is adopted only once. Unfortunately, the far from satisfactory. One reason may arise initialization fashion. Existing generally uses a random sample-agnostic...

10.1109/tip.2022.3184255 article EN IEEE Transactions on Image Processing 2022-01-01

Fine-Grained Face Swapping Via Regional GAN Inversion

OPENALEX - Publications

Zhian Liu Maomao Li Yong Zhang Cairong Wang Qi Zhang and 2 more

We present a novel paradigm for high-fidelity face swapping that faithfully preserves the desired subtle geometry and texture details. rethink from perspective of fine-grained editing, i.e., "editing swapping" (E4S), propose framework is based on explicit disentanglement shape facial components. Following E4S principle, our enables both global local features, as well controlling amount partial specified by user. Furthermore, in-herently capable handling occlusions means masks. At core system...

10.1109/cvpr52729.2023.00829 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

High-Fidelity Clothed Avatar Reconstruction from a Single Image

OPENALEX - Publications

Tingting Liao Xiaomei Zhang Yuliang Xiu Hongwei Yi Xudong Liu and 5 more

This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of high accuracy optimization-based methods and efficiency learning-based methods, we propose coarse-to-fine way to realize high-fidelity reconstruction (CAR) from single image. At first stage, use an implicit model learn general shape in canonical space person way, at second refine surface detail by estimating non-rigid deformation posed optimization way. A hyper-network is utilized...

10.1109/cvpr52729.2023.00837 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

OPENALEX - Publications

Youxin Pang Yong Zhang Weize Quan Yanbo Fan Xiaodong Cun and 2 more

One-shot video-driven talking face generation aims at producing a synthetic video by transferring the facial motion from to an arbitrary portrait image. Head pose and expression are always entangled in transferred simultaneously. However, entanglement sets up barrier for these methods be used editing directly, where it may require modify only while maintaining unchanged. One challenge of decoupling is lack paired data, such as same but different expressions. Only few attempt tackle this with...

10.1109/cvpr52729.2023.00049 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying

OPENALEX - Publications

Weihuang Liu Xiaodong Cun Chi‐Man Pun Menghan Xia Yong Zhang and 1 more

Image inpainting aims to fill the missing hole of input. It is hard solve this task efficiently when facing high-resolution images due two reasons: (1) Large reception field needs be handled for image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously form matrix. In paper, we try break above limitations first time thanks recent development continuous implicit representation. detail, down-sample encode degraded produce spatial-adaptive...

10.1609/aaai.v37i2.25263 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

3D GAN Inversion with Facial Symmetry Prior

OPENALEX - Publications

Fei Yin Yong Zhang Xuan Wang Tengfei Wang Xiaoyu Li and 6 more

Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power neural rendering. It is natural to associate 3D with GAN inversion methods project real image into generator's latent space, allowing free-view consistent synthesis and editing, referred as inversion. Although facial prior preserved in pre-trained GANs, reconstructing portrait only one monocular still an ill-pose problem. The straightforward application 2D focuses on texture similarity...

10.1109/cvpr52729.2023.00041 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Improving Fast Adversarial Training With Prior-Guided Knowledge

OPENALEX - Publications

Xiaojun Jia Yong Zhang Xingxing Wei Baoyuan Wu Ke Ma and 2 more

Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces after a few epochs. Although various variants have been proposed prevent they require high time. In this paper, we investigate relationship between example quality overfitting by comparing processes of standard FAT. We find that occurs when success rate examples becomes worse. Based...

10.1109/tpami.2024.3381180 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-26

Multiclass classification of motor imagery tasks based on multi-branch convolutional neural network and temporal convolutional network model

OPENALEX - Publications

Shiqi Yu Zedong Wang Fei Wang Kai Chen Dezhong Yao and 4 more

Abstract Motor imagery (MI) is a cognitive process wherein an individual mentally rehearses specific movement without physically executing it. Recently, MI-based brain–computer interface (BCI) has attracted widespread attention. However, accurate decoding of MI and understanding neural mechanisms still face huge challenges. These seriously hinder the clinical application development BCI systems based on MI. Thus, it very necessary to develop new methods decode tasks. In this work, we propose...

10.1093/cercor/bhad511 article EN Cerebral Cortex 2024-01-05

Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning

OPENALEX - Publications

Baoyuan Wu Weidong Chen Yanbo Fan Yong Zhang Jinlong Hou and 2 more

In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tag, such as ImageNet. However, tag annotation cannot describe all important contents of one image, and some useful information may be wasted during training. this work, we propose to train CNNs from multiple tags, enhance the quality CNN model. To end, build a large-scale multi-label image database 18M 11K categories, dubbed <italic...

10.1109/access.2019.2956775 article EN cc-by IEEE Access 2019-01-01

VDTR: Video Deblurring With Transformer

OPENALEX - Publications

Mingdeng Cao Yanbo Fan Yong Zhang Jue Wang Yujiu Yang

Video deblurring is still an unsolved problem due to the challenging spatio-temporal modeling process. While existing convolutional neural network (CNN)-based methods show a limited capacity of effective spatial and temporal for video deblurring. This paper presents VDTR, Transformer-based model that makes first attempt adapt pure Transformer VDTR exploits superior long-range relation capabilities both modeling. However, it design appropriate complicated non-uniform blurs, misalignment...

10.1109/tcsvt.2022.3201045 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-23

Robust Physical-World Attacks on Face Recognition

OPENALEX - Publications

Xin Zheng Yanbo Fan Baoyuan Wu Yong Zhang Jue Wang and 1 more

10.1016/j.patcog.2022.109009 article EN Pattern Recognition 2022-09-06

Coming Soon ...