Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Scratch
DOI: 10.48550/arxiv.2305.11540 Publication Date: 2023-01-01
ABSTRACT
Diffusion models have made impressive progress in text-to-image synthesis. However, training such large-scale (e.g. Stable Diffusion), from scratch requires high computational costs and massive high-quality text-image pairs, which becomes unaffordable other languages. To handle this challenge, we propose IAP, a simple but effective method to transfer English into Chinese. IAP optimizes only separate Chinese text encoder with all parameters fixed align semantics space the one CLIP. achieve this, innovatively treat images as pivots minimize distance of attentive features produced cross-attention between each language respectively. In way, establishes connections Chinese, visual CLIP's embedding efficiently, advancing quality generated image direct prompts. Experimental results show that our outperforms several strong diffusion 5%~10% data.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....