Review of end-to-end speech synthesis technology based on deep learning

End-to-end principle
DOI: 10.48550/arxiv.2104.09995 Publication Date: 2021-01-01
ABSTRACT
As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output intelligent machine more easily and intuitively, thus has attracted attention. Due to limitations high complexity low efficiency traditional technology, current research focus is deep learning-based end-to-end which powerful modeling ability a simpler pipeline. It mainly consists three modules: text front-end, acoustic model, vocoder. This paper reviews status these parts, classifies compares various methods according their emphasis. Moreover, this also summarizes open-source corpus English, Chinese other languages that can be used for tasks, introduces some commonly subjective objective quality evaluation method. Finally, attractive future directions are pointed out.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....