- Generative Adversarial Networks and Image Synthesis
- Adversarial Robustness in Machine Learning
- Face recognition and analysis
- Advanced Image Processing Techniques
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Advanced Steganography and Watermarking Techniques
- Chaos-based Image/Signal Encryption
- Video Analysis and Summarization
- Multimodal Machine Learning Applications
- Topic Modeling
- Computer Graphics and Visualization Techniques
- Speech and Audio Processing
- Advanced Neural Network Applications
- Human Motion and Animation
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Digital Media Forensic Detection
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- 3D Shape Modeling and Analysis
- Face and Expression Recognition
- Video Surveillance and Tracking Methods
- Image Enhancement Techniques
- AI in cancer detection
Huawei Technologies (Canada)
2021-2025
University of Alabama
2024
Southern Medical University
2024
Nanfang Hospital
2024
Liaoning Cancer Hospital & Institute
2022-2024
Tencent (China)
2019-2024
University of Science and Technology Liaoning
2013-2024
Central China Normal University
2021-2024
Geological Survey of Alabama
2024
Tianjin University of Commerce
2024
Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from same dataset. However, problem remains challenging one tries to generalize detector created by unseen methods This work addresses generalizable a simple principle: representation should be sensitive diverse types of forgeries. Following this principle, we propose enrich "diversity" synthesizing augmented with pool forgery configurations strengthen "sensitivity" enforcing...
Image inpainting has made remarkable progress with recent advances in deep learning. Popular networks mainly follow an encoder-decoder architecture (sometimes skip connections) and possess sufficiently large receptive field, i.e., larger than the image resolution. The field refers to set of input pixels that are path-connected a neuron. For task, however, size surrounding areas needed repair different kinds missing regions different, very is not always optimal, especially for local...
Generating talking head videos through a face image and piece of speech audio still contains many challenges. i.e., unnatural movement, distorted expression, identity modification. We argue that these issues are mainly caused by learning from the coupled 2D motion fields. On other hand, explicitly using 3D information also suffers problems stiff expression incoherent video. present SadTalker, which generates coefficients (head pose, expression) 3DMM implicitly modulates novel 3D-aware render...
Previous portrait image generation methods roughly fall into two categories: 2D GANs and 3D-aware GANs. can generate high fidelity portraits but with low view consistency. GAN maintain consistency their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a generator that produce view-consistent locally-editable images. Our method uses decoupled latent codes to corresponding facial semantics texture in spatial-aligned 3D volume shared geometry....
In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that CNN-based VQ-VAE with commonly used training recipes (EMA Code Reset) allows us to obtain high-quality discrete representations. For GPT, incorporate corruption strategy during the alleviate training-testing discrepancy. Despite its...
The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness generation progress, is still challenging to apply such for real-world visual content editing, especially videos. In this paper, we propose FateZero, a zero-shot editing method on videos without per-prompt training or use-specific mask. To edit consistently, several techniques based the pre-trained models. Firstly, contrast straightforward DDIM...
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in radiance fields either explicitly or implicitly. Explicit methods provide expression but cannot handle topological changes caused by hair accessories, while implicit ones can model varied topologies...
Creating a vivid video from the event or scenario in our imagination is truly fascinating experience. Recent advancements text-to-video synthesis have unveiled potential to achieve this with prompts only. While text convenient conveying overall scene context, it may be insufficient control precisely. In paper, we explore customized generation by utilizing as context description and motion structure (e.g. frame- wise depth) concrete guidance. Our method, dubbed Make-Your-Video, involves...
We present VideoReTalking, a new system to edit the faces of real-world talking head video according input audio, producing high-quality and lip-syncing output even with different emotion. Our disentangles this objective into three sequential tasks: (1) face generation canonical expression; (2) audio-driven lip-sync; (3) enhancement for improving photo-realism. Given talking-head video, we first modify expression each frame same template using editing network, resulting in expression. This...
Deepfake detection remains a challenging task due to the difficulty of generalizing new types forgeries. This problem primarily stems from overfitting existing methods forgery-irrelevant features and method-specific patterns. The latter has been rarely studied not well addressed by previous works. paper presents novel approach address two issues uncovering common forgery features. Specifically, we first propose disentanglement framework that decomposes image information into three distinct...
Adversarial training (AT) has been demonstrated to be effective in improving model robustness by leveraging adversarial examples for training. However, most AT methods are face of expensive time and computational cost calculating gradients at multiple steps generating examples. To boost efficiency, fast gradient sign method (FGSM) is adopted only once. Unfortunately, the far from satisfactory. One reason may arise initialization fashion. Existing generally uses a random sample-agnostic...
We present a novel paradigm for high-fidelity face swapping that faithfully preserves the desired subtle geometry and texture details. rethink from perspective of fine-grained editing, i.e., "editing swapping" (E4S), propose framework is based on explicit disentanglement shape facial components. Following E4S principle, our enables both global local features, as well controlling amount partial specified by user. Furthermore, in-herently capable handling occlusions means masks. At core system...
This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of high accuracy optimization-based methods and efficiency learning-based methods, we propose coarse-to-fine way to realize high-fidelity reconstruction (CAR) from single image. At first stage, use an implicit model learn general shape in canonical space person way, at second refine surface detail by estimating non-rigid deformation posed optimization way. A hyper-network is utilized...
One-shot video-driven talking face generation aims at producing a synthetic video by transferring the facial motion from to an arbitrary portrait image. Head pose and expression are always entangled in transferred simultaneously. However, entanglement sets up barrier for these methods be used editing directly, where it may require modify only while maintaining unchanged. One challenge of decoupling is lack paired data, such as same but different expressions. Only few attempt tackle this with...
Image inpainting aims to fill the missing hole of input. It is hard solve this task efficiently when facing high-resolution images due two reasons: (1) Large reception field needs be handled for image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously form matrix. In paper, we try break above limitations first time thanks recent development continuous implicit representation. detail, down-sample encode degraded produce spatial-adaptive...
Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power neural rendering. It is natural to associate 3D with GAN inversion methods project real image into generator's latent space, allowing free-view consistent synthesis and editing, referred as inversion. Although facial prior preserved in pre-trained GANs, reconstructing portrait only one monocular still an ill-pose problem. The straightforward application 2D focuses on texture similarity...
Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces after a few epochs. Although various variants have been proposed prevent they require high time. In this paper, we investigate relationship between example quality overfitting by comparing processes of standard FAT. We find that occurs when success rate examples becomes worse. Based...
Abstract Motor imagery (MI) is a cognitive process wherein an individual mentally rehearses specific movement without physically executing it. Recently, MI-based brain–computer interface (BCI) has attracted widespread attention. However, accurate decoding of MI and understanding neural mechanisms still face huge challenges. These seriously hinder the clinical application development BCI systems based on MI. Thus, it very necessary to develop new methods decode tasks. In this work, we propose...
In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tag, such as ImageNet. However, tag annotation cannot describe all important contents of one image, and some useful information may be wasted during training. this work, we propose to train CNNs from multiple tags, enhance the quality CNN model. To end, build a large-scale multi-label image database 18M 11K categories, dubbed <italic...
Video deblurring is still an unsolved problem due to the challenging spatio-temporal modeling process. While existing convolutional neural network (CNN)-based methods show a limited capacity of effective spatial and temporal for video deblurring. This paper presents VDTR, Transformer-based model that makes first attempt adapt pure Transformer VDTR exploits superior long-range relation capabilities both modeling. However, it design appropriate complicated non-uniform blurs, misalignment...