Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Image warping Abstraction
DOI: 10.21437/interspeech.2021-1461 Publication Date: 2021-08-27T05:59:39Z
ABSTRACT
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with fully differentiable duration which does not require supervised signals.The is based on novel attention mechanism and an iterative reconstruction loss Soft Dynamic Time Warping, this can learn token-frame alignments as well token durations automatically.Experimental results show that 2 outperforms baselines in subjective naturalness several diverse multi speaker evaluations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (37)