Capability-aware Prompt Reformulation Learning for Text-to-Image Generation

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Computation and Language (cs.CL) Information Retrieval (cs.IR) Computer Science - Information Retrieval
DOI: 10.48550/arxiv.2403.19716 Publication Date: 2024-03-27
ABSTRACT
Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease transforming textual prompts into visual art. However, efficacy these is intricately linked to quality user-provided prompts, which often poses a challenge users unfamiliar with prompt crafting. This paper addresses this by leveraging user reformulation data from interaction logs develop an automatic model. Our in-depth analysis reveals that heavily dependent on individual user's capability, resulting significant variance pairs. To effectively use for training, we introduce Capability-aware Prompt Reformulation (CAPR) framework. CAPR innovatively integrates capability process through two key components: Conditional Model (CRM) and Configurable Capability Features (CCF). CRM reformulates according specified represented CCF. The CCF, turn, offers flexibility tune guide CRM's behavior. enables learn diverse strategies across various capacities simulate high-capability during inference. Extensive experiments standard text-to-image benchmarks showcase CAPR's superior performance over existing baselines its remarkable robustness unseen systems. Furthermore, comprehensive analyses validate effectiveness different components. can facilitate user-friendly make advanced creation more achievable broader range users.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....