- Speech and Audio Processing
- Caching and Content Delivery
- Music and Audio Processing
- Peer-to-Peer Network Technologies
- Advanced MIMO Systems Optimization
- Image Processing Techniques and Applications
- Generative Adversarial Networks and Image Synthesis
- IoT and Edge/Fog Computing
- Face recognition and analysis
- Blind Source Separation Techniques
- Adversarial Robustness in Machine Learning
- Advanced Adaptive Filtering Techniques
- Recommender Systems and Techniques
- Video Analysis and Summarization
- Advanced Research in Science and Engineering
- Speech Recognition and Synthesis
- Energy Harvesting in Wireless Networks
- Image Processing and 3D Reconstruction
- Age of Information Optimization
- Cooperative Communication and Network Coding
- Digital Media Forensic Detection
- Advanced Computational Techniques and Applications
Shanghai University of Political Science and Law
2023-2024
PLA Information Engineering University
2020
While recent research has made significant progress in speech-driven talking face generation, the quality of generated video still lags behind that real recordings. One reason for this is use handcrafted intermediate representations like facial landmarks and 3DMM coefficients, which are designed based on human knowledge insufficient to precisely describe movements. Additionally, these methods require an external pretrained model extracting representations, whose performance sets upper bound...
Edge learning is a promising enabler to leverage the distributed local data for powering artificial intelligence at edge network. Moreover, incorporating external domain knowledge into purely data-driven models can further enhance performance. In this paper, by taking both benefits of and fusion, we propose novel knowledge-aware framework, in which devices individually train with assistance bases global base server. Due limited cache capability, device only small-scale base, restricts...
The speaker embeddings need to have the characteristics of compactness within class and large degree separation between classes, while traditional cross-entropy with softmax loss function only guarantees separability, resulting in dispersion learned features poor generalization measurement space. Therefore, from perspective enhancing discrimination embeddings, we improve model system, which effectively improves segmentation clustering performance system. On one hand, introduce AM-Softmax...
Existing Voice Cloning (VC) tasks aim to convert a paragraph text speech with desired voice specified by reference audio. This has significantly boosted the development of artificial applications. However, there also exist many scenarios that cannot be well reflected these VC tasks, such as movie dubbing, which requires emotions consistent plots. To fill this gap, in work we propose new task named Visual (V2C), seeks both audio and emotion video. facilitate research field, construct dataset,...