Sparsity-based phase spectrum compensation for single-channel speech source separation
PESQ
Non-negative Matrix Factorization
Source Separation
DOI:
10.1016/j.dsp.2019.102632
Publication Date:
2019-12-02T16:55:11Z
AUTHORS (2)
ABSTRACT
Abstract This paper proposes a sparsity-based phase spectrum compensation (SPSC) function to improve the quality of reconstructed signals by using magnitude-based single-channel speech source separation. While conventional approaches to the reconstruction of separated sources use the input phase spectrum, the proposed SPSC function modifies each source's phase spectrum using the estimated magnitude spectra of multiple sources and their spectro-temporal sparsity. In particular, the spectro-temporal sparsity is estimated from the signal-to-interference ratio between the magnitude spectrum of a source to be separated and those of the other sources, including the background noise. The proposed SPSC function is first embedded into the reconstruction stage of magnitude-based speech source separation methods that employ deep recurrent neural network (DRNN) and sparse nonnegative matrix factorization (SNMF), respectively. Then, speech denoising is performed under four different noise conditions with signal-to-noise ratios (SNRs) in the range of 0–15 dB. Both objective and subjective tests show that both DRNN and SNMF-based speech separation methods employing the proposed SPSC function substantially outperform those employing the conventional PSC function for the speech denoising task. Moreover, the DRNN combined with the proposed SPSC function offers higher average perceptual evaluation of speech quality (PESQ) scores for speech denoising than SEGAN, which is based on a very large end-to-end neural networks model. Next, the proposed SPSC method is also applied on the reconstruction stage of a deep clustering-based speech separation method to examine its contribution on a multi-talker speech separation task, where spoken utterances of males and females are mixed at −3, 0, and 3 dB SNR. It is shown from the experiment that the proposed SPSC function further improves SDR and PESQ scores of the magnitude spectra processed by the deep clustering-based single-channel speech separation method.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (36)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....