Anni Tang

ORCID: 0000-0002-9772-3293
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Face recognition and analysis
  • Generative Adversarial Networks and Image Synthesis
  • Video Coding and Compression Technologies
  • Advanced Data Compression Techniques
  • Advanced Image Processing Techniques
  • Speech and Audio Processing
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • Advanced Vision and Imaging
  • Corruption and Economic Development
  • Regional Economic and Spatial Analysis
  • 3D Shape Modeling and Analysis
  • Rural development and sustainability
  • Taxation and Compliance Studies
  • Migration and Labor Dynamics
  • Regional Economics and Spatial Analysis
  • Face and Expression Recognition
  • Migration, Health and Trauma
  • Urbanization and City Planning
  • Underwater Acoustics Research
  • Land Use and Ecosystem Services
  • Digital Media Forensic Detection
  • Music and Audio Processing
  • Simulation and Modeling Applications
  • Interpreting and Communication in Healthcare

Shanghai Jiao Tong University
2021-2024

Xi'an University of Architecture and Technology
2022-2024

Abstract China has taken significant steps to combat corruption since the 18th National Congress of Chinese Communist Party (CCP). However, whether and how anti-corruption efforts influence public's evaluation local government performance remain understudied. Using multiple data sources, including panel survey from Family Panel Studies 2010 2018, this research examines improve evaluations by reducing public perception existing corruption. Additional analysis reveals that reduce perceived...

10.1017/s030574102400167x article EN The China Quarterly 2025-01-24

Face reenactment aims to generate an animation of a source face using the poses and expressions from target face. Although recent methods have made remarkable progress by exploiting generative adversarial networks, they are limited in generating high-fidelity identity-preserving results due inappropriate driving information insufficiently effective animating strategies. In this work, we propose novel framework that achieves both generation identity preservation. Instead sparse...

10.1145/3571857 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-11-23

Video conferences introduce a new scenario for video transmission, which focuses on keeping the fidelity of faces even in low bandwidth network environment. In this work, we propose VSBNet, one frameworks to utilize face landmarks compression. Our method utilizes adversarial learning reconstruct origin frames from landmarks. To recover more details and keep consistency identity, concept visual sensitivity separate contour fast-moving parts, such as eyes mouth. Experimental results...

10.1109/icmew53276.2021.9455985 article EN 2021-06-21

Talking face generation aims at generating photorealistic video portraits of a target person driven by input audio. According to the nature audio lip motions mapping, same speech content may have different appearances even for occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this could be alleviated in part two-stage framework (i.e., an audioto- expression model followed neural-rendering model), it is still...

10.1109/tpami.2024.3409380 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-01-01

10.1109/icme57554.2024.10687892 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2024-07-15

As the latest video coding standard, versatile (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for conference scenarios under ultra-low bitrate, this paper proposes a bitrate-adjustable hybrid scheme face video. This combines pixel-level precise recovery capability of traditional with generation deep learning based on abridged information, where Pixel-wise Bi-Prediction, Low-Bitrate-FOM and Lossless Keypoint Encoder collaborate to achieve PSNR...

10.1109/icme52920.2022.9859867 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

As a special territory type, the farming–pastoral ecotone is facing challenges surrounding path creation and high-quality sustainable development. Counties are not only an important spatial unit to promote development, but also part of modernization national governance system. County-level development critical driving force breakthrough in farming-pastoral ecotone. First, this study systematically reviews progress Then, adopts “Driving Forces-Pressure-State-Impact-Responses” (DPSIR) model...

10.3390/agriculture12122042 article EN cc-by Agriculture 2022-11-29

Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature one-to-many mapping from the audio output (e.g., one speech content may have multiple feasible visual appearances), learning deterministic like previous works brings ambiguity during training, and thus causes inferior results. Although this could be alleviated in part two-stage framework (i.e., an audio-to-expression model followed neural-rendering model), it...

10.48550/arxiv.2212.05005 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges compression. Recently, NeRF has demonstrated remarkable in volumetric video compression thanks to its simple representation powerful modeling capabilities, where a notable work is ReRF. However, ReRF separates process, resulting suboptimal efficiency. In contrast, this paper, we propose method based on dynamic...

10.48550/arxiv.2402.01380 preprint EN arXiv (Cornell University) 2024-02-02

Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain precisely controlling individual concepts within generated video, such as motion appearance specific characters movement viewpoints. In this work, we propose a novel paradigm that generates each concept 3D representation separately then composes them with priors from Large Language Models (LLM) 2D diffusion models....

10.48550/arxiv.2409.00558 preprint EN arXiv (Cornell University) 2024-08-31

10.1109/icme57554.2024.10687925 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2024-07-15

Unsupervised face reenactment aims to animate a source image imitate the motions of target while retaining portrait’s attributes like facial geometry, identity, hair texture, and background. While prior methods can extract motion from via compact representations (e.g., key-points or latent bases [50]), they are not robust in predicting that disentangled with portrait attributes, thus failing preserve cross-subject reenactment. In this work, we propose an effective cost-efficient approach...

10.1145/3698769 article EN ACM Transactions on Multimedia Computing Communications and Applications 2024-10-04

The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but large data volumes they involve pose significant challenges for storage and transmission. To address these problems, existing solutions typically compress NeRF representations after training stage, leading to a separation between representation compression. In this paper, we try directly learn compact in stage based on proposed rate-aware compression framework. Specifically, video, use...

10.48550/arxiv.2411.05322 preprint EN arXiv (Cornell University) 2024-11-07

In face synthesis tasks, commonly used 2D representations (e.g. landmarks, segmentation maps, etc.) are usually sparse and discontinuous. To combat these shortcomings, we utilize a dense continuous representation, named Projected Normalized Coordinate Code (PNCC), as the guidance develop PNCC-Spatio-Normalization (PSN) method to achieve regarding arbitrary head poses expressions. Based on PSN, provide an effective framework for reenactment swapping task. ensure harmonious seamless swapping,...

10.1109/fg52635.2021.9667065 article EN 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) 2021-12-15

As video conferencing becomes an indispensable part of human's daliy life, how to achieve a high-fidelity calling experience under low bandwidth has been popular and challenging issue. Deep generative models have great potential in low-bandwidth facial compression due the excellent generation capability based on abridged information. Nevertheless, exsiting deep generation-based methods tend handle motion information pure 2D or pseudo 3D space, causing distortion when large head poses are...

10.1109/vcip59821.2023.10402733 article EN 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) 2023-12-04

As the latest video coding standard, versatile (VVC) has shown its ability in retaining pixel quality. To excavate more compression potential for conference scenarios under ultra-low bitrate, this paper proposes a bitrate adjustable hybrid scheme face video. This combines pixel-level precise recovery capability of traditional with generation deep learning based on abridged information, where Pixel wise Bi-Prediction, Low-Bitrate-FOM and Lossless Keypoint Encoder collaborate to achieve PSNR...

10.48550/arxiv.2204.10055 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...