NFDI4DS | UHH-SEMS - Publication Details

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.48550/arxiv.2409.06954 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Qiao, Yue

Kothapally, Vinay

Yu, Meng

Yu, Dong

ABSTRACT

Submitted to ICASSP 2025<br/>Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at https://bridgoon97.github.io/NeuralAmbisonicEncoding/.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....