Interactive Speech and Noise Modeling for Speech Enhancement

Leverage (statistics) Convolution (computer science)
DOI: 10.1609/aaai.v35i16.17710 Publication Date: 2022-09-08T20:14:46Z
ABSTRACT
Speech enhancement is challenging because of the diversity background noise types. Most existing methods are focused on modelling speech rather than noise. In this paper, we propose a novel idea to model and simultaneously in two-branch convolutional neural network, namely SN-Net. SN-Net, two branches predict noise, respectively. Instead information fusion only at final output layer, interaction modules introduced several intermediate feature domains between benefit each other. Such an can leverage features learned from one branch counteract undesired part restore missing component other thus enhance their discrimination capabilities. We also design extraction module, residual-convolution-and-attention (RA), capture correlations along temporal frequency dimensions for both noises. Evaluations public datasets show that module plays key role simultaneous modeling SN-Net outperforms state-of-the-art by large margin various evaluation metrics. The proposed shows superior performance speaker separation.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (59)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....