NFDI4DS | UHH-SEMS - Publication Details

Generating Realistic Images from In-the-wild Sounds

Closed captioning Modalities Sound symbolism Sound Quality

DOI: 10.48550/arxiv.2309.02405 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Taegyeong Lee

Jeonghun Kang

Hyeonyu Kim

Tae-Hwan Kim

ABSTRACT

Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and significant differences in characteristics these two modalities. Previous studies have focused on generating from limited categories or music. In this paper, we propose a novel approach generate in-the-wild sounds. First, convert into text using audio captioning. Second, attention sentence represent rich visualize sound. Lastly, direct optimization with CLIPscore AudioCLIP diffusion-based model. experiments, it shows that our model able high quality outperforms baselines both quantitative qualitative evaluations datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Generating Realistic Images from In-the-wild Sounds

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....