Yanhao Ge

ORCID: 0000-0002-5650-5118
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Face recognition and analysis
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Anomaly Detection Techniques and Applications
  • Facial Nerve Paralysis Treatment and Research
  • Human Motion and Animation
  • Face and Expression Recognition
  • Advanced Mathematical Modeling in Engineering
  • Advanced Data Compression Techniques
  • Computer Graphics and Visualization Techniques
  • Domain Adaptation and Few-Shot Learning
  • Gait Recognition and Analysis
  • Nuclear reactor physics and engineering
  • 3D Shape Modeling and Analysis
  • Chaos-based Image/Signal Encryption
  • Speech and dialogue systems
  • Image Retrieval and Classification Techniques
  • Numerical methods for differential equations
  • Advanced Image Processing Techniques
  • Speech and Audio Processing
  • Image Processing Techniques and Applications
  • Medical Image Segmentation Techniques
  • Advanced Image and Video Retrieval Techniques
  • Model Reduction and Neural Networks

Tencent (China)
2018-2022

Non-parametric face modeling aims to reconstruct 3D only from images without shape assumptions. While plausible facial details are predicted, the models tend over-depend on local color appearance and suffer ambiguous noise. To address such problem, this paper presents a novel Learning Aggregate Personalize (LAP) framework for unsupervised robust modeling. Instead of using controlled environment, proposed method implicitly disentangles ID-consistent scene-specific unconstrained photo set....

10.1109/cvpr46437.2021.01399 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

This work presents FaceX framework, a novel facial generalist model capable of handling diverse tasks simultaneously. To achieve this goal, we initially formulate unified representation for broad spectrum editing tasks, which macroscopically decomposes face into fundamental identity, intra-personal variation, and environmental factors. Based on this, introduce Facial Omni-Representation Decomposing (FORD) seamless manipulation various components, microscopically decomposing the core aspects...

10.48550/arxiv.2401.00551 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity background, showing great importance in photograph. In spite the progress this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, inefficient inference. To overcome above challenges, paper presents DiffFAE, a one-stage highly-efficient diffusion-based framework...

10.48550/arxiv.2403.17664 preprint EN arXiv (Cornell University) 2024-03-26

Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these is resource-intensive, the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient effective adapter designed high-precision high-fidelity editing models. We observe that both reenactment/swapping tasks essentially...

10.48550/arxiv.2405.12970 preprint EN arXiv (Cornell University) 2024-05-21

3D-aware GANs have shown their impressive power on 3D controlling for synthesized portraits. While the plausible facial reality is achieved, inherent properties of generated results actually not been well analyzed. One reasons that wildly-used metrics, such as Inception Score (IS) or Fréchet Distance (FID), focus more perceptual features rather than explicit clues. In this article, we propose two novel which measure face consistency and diversity a level, to compensate IS FID GAN evaluation....

10.1109/jstsp.2023.3273781 article EN IEEE Journal of Selected Topics in Signal Processing 2023-05-08

We propose an efficient framework, called Simple Swap (SimSwap), aiming for generalized and high fidelity face swapping. In contrast to previous approaches that either lack the ability generalize arbitrary identity or fail preserve attributes like facial expression gaze direction, our framework is capable of transferring source into target while preserving face. overcome above defects in following two ways. First, we present ID Injection Module (IIM) which transfers information at feature...

10.1145/3394171.3413630 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

This paper presents a novel Physically-guided Disentangled Implicit Rendering (PhyDIR) framework for highfidelity 3D face modeling. The motivation comes from two observations: Widely-used graphics renderers yield excessive approximations against photo-realistic imaging, while neural rendering methods produce superior appearances but are highly entangled to perceive 3D-aware operations. Hence, we learn disentangle the implicit via explicit physical guidance, guaranteeing properties of: (1)...

10.1109/cvpr52688.2022.01971 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this work, we propose a novel framework named Region-Aware Network (RANet), which learns the ability of anti-confusing in case heavy occlusion, nearby person and symmetric appearance, for human pose estimation. Specifically, proposed method addresses three key aspects, i.e., data augmentation, feature learning prediction fusion, respectively. First, Parsing-based Data Augmentation (PDA) to generate abundant that synthesizes confusing textures. Second, not only Feature Pyramid Stem (FPS)...

10.48550/arxiv.1905.00996 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for task by leveraging pretrained Stable Diffusion (SD), which tries solve the previous problems such insufficient inconsistent semantics. The enhancement lies two module, namely multi-source embedder dynamic attention adapter. In order provide SD with better embeddings, we considers both global local level visual...

10.48550/arxiv.2405.15287 preprint EN arXiv (Cornell University) 2024-05-24

In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve features initial images, which results in low efficiency due requirement for extensive network inference. Conversely, inversion-free lack theoretical support similarity, as they circumvent issue maintaining achieve As a consequence, none these can both high consistency. To tackle aforementioned...

10.48550/arxiv.2410.04844 preprint EN arXiv (Cornell University) 2024-10-07

Face swapping aims to generate results that combine the identity from source with attributes target. Existing methods primarily focus on image-based face swapping. When processing videos, each frame is handled independently, making it difficult ensure temporal stability. From a model perspective, gradually shifting generative adversarial networks (GANs) diffusion models (DMs), as DMs have been shown possess stronger capabilities. Current diffusion-based approaches often employ inpainting...

10.48550/arxiv.2411.18293 preprint EN arXiv (Cornell University) 2024-11-27

Practical and efficient face alignment has been highly required widely focused in recent years, especially under the trend of edge computation real-time operation. And it is a critical need to deal with masked faces context COVID-19 epidemic. In this paper, we propose novel cascaded facial landmark detector towards alignment, which call QCN (Quantized Cascaded Network). consists three stages: estimation refinement. The stage help pre-align alleviate extreme poses. next two stages localize...

10.1109/icmew53276.2021.9455962 article EN 2021-06-21

Deepfake aims to swap a face of an image with someone else’s likeness in reasonable manner. Existing methods usually perform deepfake frame by frame, thus ignoring video consistency and producing incoherent results. To address such problem, we propose novel framework Neural Identity Carrier (NICe), which learns identity transformation from arbitrary face-swapping proxy via U-Net. By modeling the incoherence between frames as noise, NICe naturally suppresses its disturbance preserves primary...

10.3390/fi13110298 article EN cc-by Future Internet 2021-11-22

Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer insufficient examples challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge amounts cases, previous augmented images by cropping pasting image patches with weak semantics, which leads to unrealistic appearance limited diversity. We instead propose Semantic Data Augmentation (SDA), a method that augments segmented parts various semantic...

10.48550/arxiv.2008.00697 preprint EN other-oa arXiv (Cornell University) 2020-01-01

In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from lack of reliable clues or priors, when input images are degraded. To address such problem, in this paper we propose novel Learning to Restore (L2R) framework for unsupervised high-quality reconstruction low-resolution images. Rather than directly refining 2D image appearance, L2R learns recover fine-grained details on proxy against degradation via extracting generative priors....

10.1109/cvpr52688.2022.00420 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Pedestrian attribute recognition, a multi-task problem, is popular task in computer vision. Generally, deep learning end-to-end networks to predict attributes are the basic method solve this problem. To fully use neural network, paper proposes novel network structure called Raft Block. Block designed not only extract task-specific features, but also share features of different tasks. Using Block, we build an Raftnet for pedestrian recognition. We implement experiments on three public...

10.1145/3297156.3297260 article EN Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence 2018-12-08

Human motion prediction aims to predict future 3D skeletal sequences by giving a limited human as inputs. Two popular methods, recurrent neural networks and feed-forward deep networks, are able rough trend, but details such limb movement may be lost. To more accurate motion, we propose an Adversarial Refinement Network (ARNet) following simple yet effective coarse-to-fine mechanism with novel adversarial error augmentation. Specifically, take both the historical coarse input of our cascaded...

10.48550/arxiv.2011.11221 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Non-parametric face modeling aims to reconstruct 3D only from images without shape assumptions. While plausible facial details are predicted, the models tend over-depend on local color appearance and suffer ambiguous noise. To address such problem, this paper presents a novel Learning Aggregate Personalize (LAP) framework for unsupervised robust modeling. Instead of using controlled environment, proposed method implicitly disentangles ID-consistent scene-specific unconstrained photo set....

10.48550/arxiv.2106.07852 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...