A study of speaker adaptation for DNN-based speech synthesis

Adaptability Speaker diarisation Word error rate
DOI: 10.7488/ds/259 Publication Date: 2015-09-06
ABSTRACT
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection is its adaptability and controllability in changing speaker characteristics speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the DNNs has not been systematically studied. In this paper, we conduct an experimental analysis adaptation DNN-based at different levels. particular, augment a low-dimensional speaker-specific vector with linguistic features input to represent identity, perform model scale hidden activation weights, feature space transformation output layer modify generated features. We analyse performance each individual technique that their combinations. Experimental results confirm DNN, listening tests demonstrate DNN can achieve significantly better than Markov (HMM) baseline terms naturalness similarity.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....