WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

FOS: Computer and information sciences Computer Science - Machine Learning Sound (cs.SD) Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 01 natural sciences Computation and Language (cs.CL) Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing 0105 earth and related environmental sciences Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2006.04598 Publication Date: 2020-01-01
ABSTRACT
In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these require either a well-trained teacher network or number of flow steps making them memory-inefficient. this paper, we propose novel model called WaveNODE which exploits continuous normalizing for speech synthesis. Unlike the conventional models, places no constraint on function used operation, thus allowing usage more flexible and complex functions. Moreover, can be optimized maximize likelihood without requiring any auxiliary loss terms. We experimentally show that achieves comparable performance with fewer parameters compared vocoders.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....