M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models

DOI: 10.48550/arxiv.2410.09928 Publication Date: 2024-10-13
ABSTRACT
This paper introduces M2M Gen, a multi modal framework for generating background music tailored to Japanese manga. The key challenges in this task are the lack of an available dataset or baseline. To address these challenges, we propose automated generation pipeline that produces input manga book. Initially, use dialogues detect scene boundaries and perform emotion classification using characters faces within scene. Then, GPT4o translate low level information into high directive. Conditioned on directive, another instance GPT 4o generates page captions guide text model. is aligned with mangas evolving narrative. effectiveness Gen confirmed through extensive subjective evaluations, showcasing its capability generate higher quality, more relevant consistent complements specific scenes when compared our baselines.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....