Voice Conversion Using Input-to-Output Highway Networks

Smoothing Cepstrum Mel-frequency cepstrum
DOI: 10.1587/transinf.2017edl8034 Publication Date: 2017-07-31T22:19:50Z
ABSTRACT
This paper proposes Deep Neural Network (DNN)-based Voice Conversion (VC) using input-to-output highway networks. VC is a speech synthesis technique that converts input features into output parameters, and DNN-based acoustic models for are used to estimate the parameters from parameters. Given often in same domain (e.g., cepstrum) VC, this networks connected output. The predict weighted spectral differentials between architecture not only alleviates over-smoothing effects degrade quality, but also effectively represents characteristics of experimental results demonstrate proposed outperforms Feed-Forward neural terms quality speaker individuality converted speech.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (20)
CITATIONS (20)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....