Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
Audio visual
Modality (human–computer interaction)
Representation
DOI:
10.48550/arxiv.2401.13270
Publication Date:
2024-01-01
AUTHORS (7)
ABSTRACT
Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires accurate semantic understanding of scenes to estimate reasonable colors for grayscale images. Although recent interaction-based methods have achieved impressive performance, it still a very difficult task infer realistic and automatic colorization. To reduce the difficulty scenes, this paper tries utilize corresponding audio, naturally contains extra information about same scene. Specifically, novel audio-infused (AIAIC) network proposed, consists three stages. First, we take color semantics as bridge pretrain guided by semantics. Second, natural co-occurrence audio video utilized learn correlations between visual scenes. Third, implicit representation fed into pretrained finally realize audio-guided The whole process trained in self-supervised manner without human annotation. In addition, audiovisual dataset established training testing. Experiments demonstrate that guidance can effectively improve performance colorization, especially some are understand only from modality.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....