Qi Chen

ORCID: 0009-0000-7982-9329
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Caching and Content Delivery
  • Music and Audio Processing
  • Peer-to-Peer Network Technologies
  • Advanced MIMO Systems Optimization
  • Image Processing Techniques and Applications
  • Generative Adversarial Networks and Image Synthesis
  • IoT and Edge/Fog Computing
  • Face recognition and analysis
  • Blind Source Separation Techniques
  • Adversarial Robustness in Machine Learning
  • Advanced Adaptive Filtering Techniques
  • Recommender Systems and Techniques
  • Video Analysis and Summarization
  • Advanced Research in Science and Engineering
  • Speech Recognition and Synthesis
  • Energy Harvesting in Wireless Networks
  • Image Processing and 3D Reconstruction
  • Age of Information Optimization
  • Cooperative Communication and Network Coding
  • Digital Media Forensic Detection
  • Advanced Computational Techniques and Applications

Shanghai University of Political Science and Law
2023-2024

PLA Information Engineering University
2020

While recent research has made significant progress in speech-driven talking face generation, the quality of generated video still lags behind that real recordings. One reason for this is use handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge insufficient to precisely describe movements. Additionally, these methods require an external pretrained model extracting representations, whose performance sets upper bound...

10.48550/arxiv.2303.17550 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Edge learning is a promising enabler to leverage the distributed local data for powering artificial intelligence at edge network. Moreover, incorporating external domain knowledge into purely data-driven models can further enhance performance. In this paper, by taking both benefits of and fusion, we propose novel knowledge-aware framework, in which devices individually train with assistance bases global base server. Due limited cache capability, device only small-scale base, restricts...

10.1109/wcnc55385.2023.10119099 article EN 2022 IEEE Wireless Communications and Networking Conference (WCNC) 2023-03-01

The speaker embeddings need to have the characteristics of compactness within class and large degree separation between classes, while traditional cross-entropy with softmax loss function only guarantees separability, resulting in dispersion learned features poor generalization measurement space. Therefore, from perspective enhancing discrimination embeddings, we improve model system, which effectively improves segmentation clustering performance system. On one hand, introduce AM-Softmax...

10.1145/3436369.3437440 article EN 2020-10-30

Existing Voice Cloning (VC) tasks aim to convert a paragraph text speech with desired voice specified by reference audio. This has significantly boosted the development of artificial applications. However, there also exist many scenarios that cannot be well reflected these VC tasks, such as movie dubbing, which requires emotions consistent plots. To fill this gap, in work we propose new task named Visual (V2C), seeks both audio and emotion video. facilitate research field, construct dataset,...

10.48550/arxiv.2111.12890 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...