- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Video Coding and Compression Technologies
- Advanced Data Compression Techniques
- Advanced Image Processing Techniques
- Speech and Audio Processing
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- Advanced Vision and Imaging
- Corruption and Economic Development
- Regional Economic and Spatial Analysis
- 3D Shape Modeling and Analysis
- Rural development and sustainability
- Taxation and Compliance Studies
- Migration and Labor Dynamics
- Regional Economics and Spatial Analysis
- Face and Expression Recognition
- Migration, Health and Trauma
- Urbanization and City Planning
- Underwater Acoustics Research
- Land Use and Ecosystem Services
- Digital Media Forensic Detection
- Music and Audio Processing
- Simulation and Modeling Applications
- Interpreting and Communication in Healthcare
Shanghai Jiao Tong University
2021-2024
Xi'an University of Architecture and Technology
2022-2024
Abstract China has taken significant steps to combat corruption since the 18th National Congress of Chinese Communist Party (CCP). However, whether and how anti-corruption efforts influence public's evaluation local government performance remain understudied. Using multiple data sources, including panel survey from Family Panel Studies 2010 2018, this research examines improve evaluations by reducing public perception existing corruption. Additional analysis reveals that reduce perceived...
Face reenactment aims to generate an animation of a source face using the poses and expressions from target face. Although recent methods have made remarkable progress by exploiting generative adversarial networks, they are limited in generating high-fidelity identity-preserving results due inappropriate driving information insufficiently effective animating strategies. In this work, we propose novel framework that achieves both generation identity preservation. Instead sparse...
Video conferences introduce a new scenario for video transmission, which focuses on keeping the fidelity of faces even in low bandwidth network environment. In this work, we propose VSBNet, one frameworks to utilize face landmarks compression. Our method utilizes adversarial learning reconstruct origin frames from landmarks. To recover more details and keep consistency identity, concept visual sensitivity separate contour fast-moving parts, such as eyes mouth. Experimental results...
Talking face generation aims at generating photorealistic video portraits of a target person driven by input audio. According to the nature audio lip motions mapping, same speech content may have different appearances even for occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this could be alleviated in part two-stage framework (i.e., an audioto- expression model followed neural-rendering model), it is still...
As the latest video coding standard, versatile (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for conference scenarios under ultra-low bitrate, this paper proposes a bitrate-adjustable hybrid scheme face video. This combines pixel-level precise recovery capability of traditional with generation deep learning based on abridged information, where Pixel-wise Bi-Prediction, Low-Bitrate-FOM and Lossless Keypoint Encoder collaborate to achieve PSNR...
As a special territory type, the farming–pastoral ecotone is facing challenges surrounding path creation and high-quality sustainable development. Counties are not only an important spatial unit to promote development, but also part of modernization national governance system. County-level development critical driving force breakthrough in farming-pastoral ecotone. First, this study systematically reviews progress Then, adopts “Driving Forces-Pressure-State-Impact-Responses” (DPSIR) model...
Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature one-to-many mapping from the audio output (e.g., one speech content may have multiple feasible visual appearances), learning deterministic like previous works brings ambiguity during training, and thus causes inferior results. Although this could be alleviated in part two-stage framework (i.e., an audio-to-expression model followed neural-rendering model), it...
Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges compression. Recently, NeRF has demonstrated remarkable in volumetric video compression thanks to its simple representation powerful modeling capabilities, where a notable work is ReRF. However, ReRF separates process, resulting suboptimal efficiency. In contrast, this paper, we propose method based on dynamic...
Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain precisely controlling individual concepts within generated video, such as motion appearance specific characters movement viewpoints. In this work, we propose a novel paradigm that generates each concept 3D representation separately then composes them with priors from Large Language Models (LLM) 2D diffusion models....
Unsupervised face reenactment aims to animate a source image imitate the motions of target while retaining portrait’s attributes like facial geometry, identity, hair texture, and background. While prior methods can extract motion from via compact representations (e.g., key-points or latent bases [50]), they are not robust in predicting that disentangled with portrait attributes, thus failing preserve cross-subject reenactment. In this work, we propose an effective cost-efficient approach...
The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but large data volumes they involve pose significant challenges for storage and transmission. To address these problems, existing solutions typically compress NeRF representations after training stage, leading to a separation between representation compression. In this paper, we try directly learn compact in stage based on proposed rate-aware compression framework. Specifically, video, use...
In face synthesis tasks, commonly used 2D representations (e.g. landmarks, segmentation maps, etc.) are usually sparse and discontinuous. To combat these shortcomings, we utilize a dense continuous representation, named Projected Normalized Coordinate Code (PNCC), as the guidance develop PNCC-Spatio-Normalization (PSN) method to achieve regarding arbitrary head poses expressions. Based on PSN, provide an effective framework for reenactment swapping task. ensure harmonious seamless swapping,...
As video conferencing becomes an indispensable part of human's daliy life, how to achieve a high-fidelity calling experience under low bandwidth has been popular and challenging issue. Deep generative models have great potential in low-bandwidth facial compression due the excellent generation capability based on abridged information. Nevertheless, exsiting deep generation-based methods tend handle motion information pure 2D or pseudo 3D space, causing distortion when large head poses are...
As the latest video coding standard, versatile (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for conference scenarios under ultra-low bitrate, this paper proposes a bitrate adjustable hybrid scheme face video. This combines pixel-level precise recovery capability of traditional with generation deep learning based on abridged information, where Pixel wise Bi-Prediction, Low-Bitrate-FOM and Lossless Keypoint Encoder collaborate to achieve PSNR...