StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
Machine Learning (stat.ML)
02 engineering and technology
Computer Science - Sound
Machine Learning (cs.LG)
Statistics - Machine Learning
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.21437/interspeech.2019-2236
Publication Date:
2019-09-13T20:32:51Z
AUTHORS (4)
ABSTRACT
Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings among multiple domains without relying on parallel data.This important but challenging owing to the requirement of and nonavailability explicit supervision.Recently, StarGAN-VC has garnered attention its ability solve this problem only using single generator.However, there still gap between real converted speech.To bridge gap, we rethink conditional methods StarGAN-VC, which are key components achieving non-parallel VC in model, propose an improved variant called StarGAN-VC2.Particularly, two aspects: training objectives network architectures.For former, source-and-target adversarial loss that allows all source domain data be convertible target data.For latter, introduce modulation-based method can transform modulation acoustic feature domain-specific manner.We evaluated our multi-speaker VC.An objective evaluation demonstrates proposed improve speech quality terms both global local structure measures.Furthermore, subjective shows StarGAN-VC2 outperforms naturalness speaker similarity.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (93)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....