NFDI4DS | UHH-SEMS - Publication Details

Sparsity-based phase spectrum compensation for single-channel speech source separation

PESQ Non-negative Matrix Factorization Source Separation

DOI: 10.1016/j.dsp.2019.102632 Publication Date: 2019-12-02T16:55:11Z

Abstract Supplemental Material References Cited by

AUTHORS (2)

Kwang Myung Jeon

Hong Kook Kim

ABSTRACT

Abstract This paper proposes a sparsity-based phase spectrum compensation (SPSC) function to improve the quality of reconstructed signals by using magnitude-based single-channel speech source separation. While conventional approaches to the reconstruction of separated sources use the input phase spectrum, the proposed SPSC function modifies each source's phase spectrum using the estimated magnitude spectra of multiple sources and their spectro-temporal sparsity. In particular, the spectro-temporal sparsity is estimated from the signal-to-interference ratio between the magnitude spectrum of a source to be separated and those of the other sources, including the background noise. The proposed SPSC function is first embedded into the reconstruction stage of magnitude-based speech source separation methods that employ deep recurrent neural network (DRNN) and sparse nonnegative matrix factorization (SNMF), respectively. Then, speech denoising is performed under four different noise conditions with signal-to-noise ratios (SNRs) in the range of 0–15 dB. Both objective and subjective tests show that both DRNN and SNMF-based speech separation methods employing the proposed SPSC function substantially outperform those employing the conventional PSC function for the speech denoising task. Moreover, the DRNN combined with the proposed SPSC function offers higher average perceptual evaluation of speech quality (PESQ) scores for speech denoising than SEGAN, which is based on a very large end-to-end neural networks model. Next, the proposed SPSC method is also applied on the reconstruction stage of a deep clustering-based speech separation method to examine its contribution on a multi-talker speech separation task, where spoken utterances of males and females are mixed at −3, 0, and 3 dB SNR. It is shown from the experiment that the proposed SPSC function further improves SDR and PESQ scores of the magnitude spectra processed by the deep clustering-based single-channel speech separation method.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (36)

CITATIONS (2)

EXTERNAL LINKS

OPENALEX - Publications CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Sparsity-based phase spectrum compensation for single-channel speech source separation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....