NFDI4DS | UHH-SEMS - Publication Details

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

PSQM Intelligibility (philosophy) Codec Representation

DOI: 10.21437/interspeech.2021-475 Publication Date: 2021-08-27T05:59:39Z

Abstract Supplemental Material References Cited by

AUTHORS (8)

Adam Polyak

Yossi Adi

Jade Copet

Eugene Kharitonov

Kushal Lakhotia

Wei-Ning Hsu

Abdelrahman Mohamed

Emmanuel Dupoux

ABSTRACT

We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate content, prosodic information, and speaker identity. This allows to synthesize in a controllable manner. analyze various state-of-the-art, representation learning methods shed light on advantages each method while considering reconstruction quality disentanglement properties. Specifically, evaluate F0 reconstruction, identification performance (for both resynthesis voice conversion), recordings' intelligibility, overall subjective human evaluation. Lastly, demonstrate how these can be used an ultra-lightweight codec. Using obtained representations, get rate 365 bits per second providing better than baseline methods. Audio samples found under following link: this http URL.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (96)

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....