NFDI4DS | UHH-SEMS - Publication Details

Review of end-to-end speech synthesis technology based on deep learning

End-to-end principle

DOI: 10.48550/arxiv.2104.09995 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Zhaoxi Mu

Xinyu Yang

Yizhuo Dong

ABSTRACT

As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output intelligent machine more easily and intuitively, thus has attracted attention. Due to limitations high complexity low efficiency traditional technology, current research focus is deep learning-based end-to-end which powerful modeling ability a simpler pipeline. It mainly consists three modules: text front-end, acoustic model, vocoder. This paper reviews status these parts, classifies compares various methods according their emphasis. Moreover, this also summarizes open-source corpus English, Chinese other languages that can be used for tasks, introduces some commonly subjective objective quality evaluation method. Finally, attractive future directions are pointed out.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Review of end-to-end speech synthesis technology based on deep learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....