NFDI4DS | UHH-SEMS - Publication Details

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Separation (statistics) Source Separation

DOI: 10.21437/interspeech.2021-430 Publication Date: 2021-08-27T05:59:39Z

Abstract Supplemental Material References Cited by

AUTHORS (5)

Yong Xu

Zhuohuang Zhang

Meng Yu

Shi-Xiong Zhang

Dong Yu

ABSTRACT

Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce non-linear distortion, residual noise level of MVDR separated speech is still high.In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target separation.This new beamforming framework directly learns weights from estimated and spatial covariance matrices.Leveraging on temporal modeling capability RNNs, RNN-BF automatically accumulate statistics matrices to learn frame-level in recursive way.An RNN-based generalized eigenvalue (RNN-GEV) more RNN (GRNN-BF) are proposed.We further improve RNN-GEV GRNN-BF by using layer normalization replace commonly used mask matrices.The proposed obtains better performance against prior arts terms quality (PESQ), speech-to-noise ratio (SNR) word error rate (WER).

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (28)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....