Self-Supervised Learning for Speech-Based Detection of Depressive States

DOI: 10.54097/1cspmj65 Publication Date: 2025-02-28T01:01:39Z
ABSTRACT
This study aims to enhance the accuracy of depression detection by leveraging representation learning from audio data. The data speech sets are sparse and costly annotate. Therefore, a self-supervised pre-training approach is employed improve performance, generalization capability, training efficiency downstream tasks. When processing unlabeled data, pre-trained representations based on may be interfered with noisy if there significant amount noise or errors present. Consequently, it necessary effectively analyze long-distance sequence anti-interference capabilities. However, traditional LSTM models have limitations in context extraction robustness input outliers. Thus, an improved method named CNN-BiLSTM proposed this paper. network initializes LSTM's embedding layer word vectors extracts spatial temporal features separately ensure full complete expression useful information. Different weights assigned importance obtain fused features. Additionally, random forest used for classification mitigate risk overfitting demonstrate good performance when high-dimensional Experimental results show that model exhibits dataset, outperforming methods state-of-the-art investigations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....