NFDI4DS | UHH-SEMS - Publication Details

Towards Efficient Large Multimodal Model Serving

FOS: Computer and information sciences Artificial Intelligence (cs.AI) Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Artificial Intelligence Distributed, Parallel, and Cluster Computing (cs.DC)

DOI: 10.48550/arxiv.2502.00937 Publication Date: 2025-02-02

Abstract Supplemental Material References Cited by

AUTHORS (12)

H. Qiu

A. K. Biswas

Zihan Zhao

Jayashree Mohan

Alind Khare

Esha Choukse

Íñigo Goiri

Zeyu Zhang

Haiying Shen

Chetan Bansal

Ramachandran Ramjee

Rodrigo Fonseca

ABSTRACT

Recent advances in generative AI have led to large multi-modal models (LMMs) capable of simultaneously processing inputs various modalities such as text, images, video, and audio. While these demonstrate impressive capabilities, efficiently serving them production environments poses significant challenges due their complex architectures heterogeneous resource requirements. We present the first comprehensive systems analysis two prominent LMM architectures, decoder-only cross-attention, on six representative open-source models. investigate multi-stage inference pipelines utilization patterns that lead unique design implications. also an in-depth traces, uncovering workload characteristics, including variable, heavy-tailed request distributions, diverse modal combinations, bursty traffic patterns. Our key findings reveal different stages exhibit highly performance characteristics demands, while concurrent requests across interference. To address challenges, we propose a decoupled architecture enables independent allocation adaptive scaling for each stage. further optimizations stage colocation maximize throughput meeting latency objectives.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Towards Efficient Large Multimodal Model Serving

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....