Towards Controllable Audio Texture Morphing

FOS: Computer and information sciences Sound (cs.SD) Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
DOI: 10.48550/arxiv.2304.11648 Publication Date: 2023-06-04
ABSTRACT
In this paper, we propose a data-driven approach to train a Generative Adversarial Network (GAN) conditioned on "soft-labels" distilled from the penultimate layer of an audio classifier trained on a target set of audio texture classes. We demonstrate that interpolation between such conditions or control vectors provides smooth morphing between the generated audio textures, and shows similar or better audio texture morphing capability compared to the state-of-the-art methods. The proposed approach results in a well-organized latent space that generates novel audio outputs while remaining consistent with the semantics of the conditioning parameters. This is a step towards a general data-driven approach to designing generative audio models with customized controls capable of traversing out-of-distribution regions for novel sound synthesis.<br/>accepted to ICASSP 2023<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....