Cached Multi-Lora Composition for Multi-Concept Image Generation
FOS: Computer and information sciences
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
DOI:
10.48550/arxiv.2502.04923
Publication Date:
2025-02-07
AUTHORS (4)
ABSTRACT
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models, enabling precise rendering of multiple distinct elements, such characters and styles, multi-concept image generation. However, current approaches face significant challenges when composing these LoRAs for generation, resulting diminished generated quality. In this paper, we initially investigate the role denoising process through lens Fourier frequency domain. Based on hypothesis that applying could lead to "semantic conflicts", find certain amplify high-frequency features edges textures, whereas others mainly focus low-frequency including overall structure smooth color gradients. Building insights, devise domain based sequencing strategy determine optimal order which should be integrated during inference. This offers methodical generalizable solution compared naive integration commonly found existing LoRA fusion techniques. To fully leverage our proposed sequence determination method multi-LoRA composition tasks, introduce novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed efficiently integrate while maintaining cohesive With its flexible backbone non-uniform caching tailored individual LoRAs, CMLoRA potential reduce semantic conflicts improve computational efficiency. Our experimental evaluations demonstrate outperforms state-of-the-art methods by margin -- it achieves an average improvement $2.19\%$ CLIPScore, $11.25\%$ MLLM win rate LoraHub, Composite, Switch.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....