Generating Realistic Images from In-the-wild Sounds
Closed captioning
Modalities
Sound symbolism
Sound Quality
DOI:
10.48550/arxiv.2309.02405
Publication Date:
2023-01-01
AUTHORS (4)
ABSTRACT
Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and significant differences in characteristics these two modalities. Previous studies have focused on generating from limited categories or music. In this paper, we propose a novel approach generate in-the-wild sounds. First, convert into text using audio captioning. Second, attention sentence represent rich visualize sound. Lastly, direct optimization with CLIPscore AudioCLIP diffusion-based model. experiments, it shows that our model able high quality outperforms baselines both quantitative qualitative evaluations datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....