Deep Learning-based Environmental Sound Classification Using Feature Fusion and Data Enhancement

Overfitting Mel-frequency cepstrum Spectrogram
DOI: 10.32604/cmc.2023.032719 Publication Date: 2022-09-22T03:30:10Z
ABSTRACT
Environmental sound classification (ESC) involves the process of distinguishing an audio stream associated with numerous environmental sounds. Some common aspects such as framework difference, overlapping different events, and presence various sources during recording make ESC task much more complicated complex. This research is to propose a deep learning model improve recognition rate sounds reduce training time under limited computation resources. In this research, performance transformer convolutional neural networks (CNN) are investigated. Seven features, chromagram, Mel-spectrogram, tonnetz, Mel-Frequency Cepstral Coefficients (MFCCs), delta MFCCs, delta-delta MFCCs spectral contrast, extracted from UrbanSound8K, ESC-50, ESC-10, databases. Moreover, also employed three data enhancement methods, namely, white noise, pitch tuning, stretch risk overfitting issue due clips. The evaluation experiments demonstrates that best was achieved by proposed using seven features on enhanced database. For highest attained accuracies 0.98, 0.94, 0.97 respectively. experimental results reveal technique can achieve for problems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (28)
CITATIONS (4)