DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities

Duplex (building)
DOI: 10.48550/arxiv.2502.11123 Publication Date: 2025-02-16
ABSTRACT
Real-time speech conversation is essential for natural and efficient human-machine interactions, requiring duplex streaming capabilities. Traditional Transformer-based conversational chatbots operate in a turn-based manner exhibit quadratic computational complexity that grows as the input size increases. In this paper, we propose DuplexMamba, Mamba-based end-to-end multimodal model speech-to-text conversation. DuplexMamba enables simultaneous processing output generation, dynamically adjusting to support real-time streaming. Specifically, develop encoder adapt it with language model. Furthermore, introduce novel decoding strategy process generate simultaneously. Experimental results demonstrate successfully implements capabilities while achieving performance comparable several recently developed models automatic recognition (ASR) tasks voice assistant benchmark evaluations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....