SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
Adapter (computing)
Scene graph
DOI:
10.48550/arxiv.2405.15321
Publication Date:
2024-05-24
AUTHORS (12)
ABSTRACT
Recent advancements in text-to-image generation have been propelled by the development of diffusion models and multi-modality learning. However, since text is typically represented sequentially these models, it often falls short providing accurate contextualization structural control. So generated images do not consistently align with human expectations, especially complex scenarios involving multiple objects relationships. In this paper, we introduce Scene Graph Adapter(SG-Adapter), leveraging structured representation scene graphs to rectify inaccuracies original embeddings. The SG-Adapter's explicit non-fully connected graph greatly improves fully connected, transformer-based representations. This enhancement particularly notable maintaining precise correspondence To address challenges posed low-quality annotated datasets like Visual Genome, manually curated a highly clean, multi-relational graph-image paired dataset MultiRels. Furthermore, design three metrics derived from GPT-4V effectively thoroughly measure between graphs. Both qualitative quantitative results validate efficacy our approach controlling
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....