- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Blockchain Technology Applications and Security
- Topic Modeling
- Evaluation and Optimization Models
- Natural Language Processing Techniques
- Quality Function Deployment in Product Design
- Multi-Criteria Decision Making
- Caching and Content Delivery
- Adversarial Robustness in Machine Learning
- Privacy-Preserving Technologies in Data
- IoT and Edge/Fog Computing
Hebei University of Economics and Business
2018-2023
Northwestern Polytechnical University
2018-2021
Recently, attention-based end-to-end speech synthesis has achieved superior performance compared to traditional models, and several approaches like global style tokens are proposed explore the controllability of model. Although existing methods show good in disentanglement transfer, it is still unable control explicit emotion generated speech. In this paper, we mainly focus on subtle expressive synthesis, where category strength can be easily controlled with a discrete emotional vector...
Recently, end-to-end (E2E) neural text-to-speech systems, such as Tacotron2, have begun to surpass the traditional multi-stage hand-engineered with both simplified system building pipelines and high-quality speech. With a unique encoder-decoder structure, Tacotron2 no longer needs separately learned text analysis front-end, duration model, acoustic audio synthesis module. The key of lies in attention mechanism, which learns an alignment between encoder decoder, serving implicit model...
Federated learning has emerged as a promising technique for the Internet of Things (IoT) in various domains, including supply chain management. It enables IoT devices to collaboratively learn without exposing their raw data, ensuring data privacy. However, federated faces threats local tampering and upload process attacks. This paper proposes an innovative framework that leverages Trusted Execution Environment (TEE) blockchain technology address security privacy challenges Our achieves...
Adaptability and controllability in changing speaking styles speaker characteristics are the advantages of deep neural networks (DNNs) based statistical parametric speech synthesis (SPSS). This paper presents a comprehensive study on use DNNs for expressive with small set emotional data. Specifically, we three typical model adaptation approaches: (1) retraining by emotion-specific data (retrain), (2) augmenting network input using codes (code) (3) emotion-dependent output layers shared...
This paper presents a study on the use of input codes in neural network acoustic modeling for expressive TTS. Specifically, we different kinds codes, augmented with linguistic features, as BLSTM-based model, to control expressivity synthesized speech. The one-hot representation, include dialogue code, sentiment code and sentence position code. indicates whether text is or narration an audiobook story. obtained from analysis tool, which labels each positive, negative neutral. paragraph. We...
A hybrid multi-attribute decision-making model with unknown attribute weight is studied. To simplify the calculation, interval number and linguistic variables are transformed into crisp numbers to get standard decision matrix. Considering fuzziness correlation between attributes, importance of attributes represented by fuzzy measure. ensure results more scientific, Analytic Hierarchy Process (AHP) used calculate subjective then modified Mahalanobis-Taguchi System (MTS) obtain measures...