Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis

0103 physical sciences 01 natural sciences
DOI: 10.1121/10.0026219 Publication Date: 2024-06-03T12:20:26Z
ABSTRACT
Speech enhancement aims to make noisy speech signals clearer. Traditional time-frequency domain methods struggle to differentiate between speech and noise, leading to a risk of speech distortion. This paper introduces an approach that combines the time domain and time-frequency domain using the W-net module to suppress noise at the front end. The module is an improved version of Wave-U-Net, called TTF-W-Net. We conducted experiments using the TIMIT speech and NOISEX-92 noise datasets to evaluate the enhancement performance achieved by integrating preprocessing networks, specifically Wave-U-Net and our TTF-W-Net, into the baseline methods: Phase, FullSubNet+, and DB-AIAT. Experimental results show that TTF-W-Net outperforms the baseline Wave-U-Net by 15.7% on the PESQ metric and the effect of the network by using our preprocessing method is improved. Consequently, the TTF-W-Net preprocessing Net offers effective speech enhancement.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (27)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....