Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
PSQM
Intelligibility (philosophy)
Codec
Representation
DOI:
10.21437/interspeech.2021-475
Publication Date:
2021-08-27T05:59:39Z
AUTHORS (8)
ABSTRACT
We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate content, prosodic information, and speaker identity. This allows to synthesize in a controllable manner. analyze various state-of-the-art, representation learning methods shed light on advantages each method while considering reconstruction quality disentanglement properties. Specifically, evaluate F0 reconstruction, identification performance (for both resynthesis voice conversion), recordings' intelligibility, overall subjective human evaluation. Lastly, demonstrate how these can be used an ultra-lightweight codec. Using obtained representations, get rate 365 bits per second providing better than baseline methods. Audio samples found under following link: this http URL.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (96)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....