Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
FOS: Computer and information sciences
03 medical and health sciences
Sound (cs.SD)
Statistics - Machine Learning
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Machine Learning (stat.ML)
0305 other medical science
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.48550/arxiv.1711.11293
Publication Date:
2017-01-01
AUTHORS (2)
ABSTRACT
We propose a parallel-data-free voice-conversion (VC) method that can learn mapping from source to target speech without relying on parallel data. The proposed is general purpose, high quality, and parallel-data free works any extra data, modules, or alignment procedure. It also avoids over-smoothing, which occurs in many conventional statistical model-based VC methods. Our method, called CycleGAN-VC, uses cycle-consistent adversarial network (CycleGAN) with gated convolutional neural networks (CNNs) an identity-mapping loss. A CycleGAN learns forward inverse mappings simultaneously using cycle-consistency losses. This makes it possible find optimal pseudo pair unpaired Furthermore, the loss contributes reducing over-smoothing of converted feature sequence. configure CNNs train allows function capture sequential hierarchical structures while preserving linguistic information. evaluated our task. An objective evaluation showed sequence was near natural terms global variance modulation spectra. subjective quality comparable obtained Gaussian mixture under advantageous conditions twice amount
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....