HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
Computer Science - Computation and Language
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2404.04645
Publication Date:
2024-04-06
AUTHORS (5)
ABSTRACT
Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain domain. While developing TTS architectures that train and test on same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation new can be achieved by fine-tuning whole model for each domain, thus making it parameter-inefficient. This problem solved Adapters provide parameter-efficient alternative adaptation. Although famous in NLP, synthesis not much improvement Adapters. In this work, we present HyperTTS, which comprises small learnable network, "hypernetwork", generates parameters Adapter blocks, allowing us condition representations them dynamic. Extensive evaluations two settings demonstrate its effectiveness achieving state-of-the-art regime. We also compare different variants comparing with baselines studies. Promising results dynamic adapter using hypernetworks open up avenues domain-generic multi-speaker systems. The audio samples code are available at https://github.com/declare-lab/HyperTTS.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....