Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

PSQM Intelligibility (philosophy) Codec Representation
DOI: 10.21437/interspeech.2021-475 Publication Date: 2021-08-27T05:59:39Z
ABSTRACT
We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate content, prosodic information, and speaker identity. This allows to synthesize in a controllable manner. analyze various state-of-the-art, representation learning methods shed light on advantages each method while considering reconstruction quality disentanglement properties. Specifically, evaluate F0 reconstruction, identification performance (for both resynthesis voice conversion), recordings' intelligibility, overall subjective human evaluation. Lastly, demonstrate how these can be used an ultra-lightweight codec. Using obtained representations, get rate 365 bits per second providing better than baseline methods. Audio samples found under following link: this http URL.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (96)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....