PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Spectrogram Harmonic
DOI: 10.1609/aaai.v34i05.6489 Publication Date: 2020-06-17T08:27:54Z
ABSTRACT
Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition amplitude prediction. In this paper, we propose phase-and-harmonics-aware deep neural network (DNN), named PHASEN, task. Unlike previous methods which directly use complex ideal ratio mask supervise the DNN learning, design two-stream network, where stream and are dedicated We discover that two streams should communicate with each other, crucial addition, frequency transformation blocks catch long-range correlations along axis. Visualization shows learned matrix implicitly captures harmonic correlation, has proven be helpful T-F spectrogram reconstruction. With these innovations, PHASEN acquires ability handle detailed patterns utilize patterns, getting 1.76dB SDR improvement on AVSpeech + AudioSet dataset. It also achieves significant gains over Google's On Voice Bank DEMAND dataset, outperforms by large margin four metrics.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (203)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....