TF-Mamba: A Time-Frequency Network for Sound Source Localization
FOS: Computer and information sciences
Sound (cs.SD)
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.48550/arxiv.2409.05034
Publication Date:
2024-09-08
AUTHORS (2)
ABSTRACT
Sound source localization (SSL) determines the position of sound sources using multi-channel audio data. It is commonly used to improve speech enhancement and separation. Extracting spatial features crucial for SSL, especially in challenging acoustic environments. Previous studies performed well based on long short-term memory models. Recently, a novel scalable SSM referred as Mamba demonstrated notable performance across various sequence-based modalities, including speech. This study introduces SSL tasks. We consider Mamba-based model analyze from signals by fusing both time frequency features, we develop an system called TF-Mamba. integrates fusion, with Bidirectional managing time-wise frequency-wise processing. conduct experiments simulated dataset LOCATA dataset. Experiments show that TF-Mamba significantly outperforms other advanced methods real-world
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....