NFDI4DS | UHH-SEMS - Publication Details

M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models

DOI: 10.48550/arxiv.2410.09928 Publication Date: 2024-10-13

Abstract Supplemental Material References Cited by

AUTHORS (4)

Megha Sharma

Muhammad Taimoor ...

Gus Xia

Yoshimasa Tsuruoka

ABSTRACT

This paper introduces M2M Gen, a multi modal framework for generating background music tailored to Japanese manga. The key challenges in this task are the lack of an available dataset or baseline. To address these challenges, we propose automated generation pipeline that produces input manga book. Initially, use dialogues detect scene boundaries and perform emotion classification using characters faces within scene. Then, GPT4o translate low level information into high directive. Conditioned on directive, another instance GPT 4o generates page captions guide text model. is aligned with mangas evolving narrative. effectiveness Gen confirmed through extensive subjective evaluations, showcasing its capability generate higher quality, more relevant consistent complements specific scenes when compared our baselines.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....