Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Image warping
Abstraction
DOI:
10.21437/interspeech.2021-1461
Publication Date:
2021-08-27T05:59:39Z
AUTHORS (7)
ABSTRACT
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with fully differentiable duration which does not require supervised signals.The is based on novel attention mechanism and an iterative reconstruction loss Soft Dynamic Time Warping, this can learn token-frame alignments as well token durations automatically.Experimental results show that 2 outperforms baselines in subjective naturalness several diverse multi speaker evaluations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (37)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....