- Speech and Audio Processing
- Speech Recognition and Synthesis
- Music and Audio Processing
- Natural Language Processing Techniques
- Telecommunications and Broadcasting Technologies
- Advanced Adaptive Filtering Techniques
- Infant Health and Development
South China University of Technology
2020-2023
Tencent (China)
2021
The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed micro-phone arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application,...
Speaker clustering is a task to merge speech segments uttered by the same speaker into single cluster, which an effective tool for alleviating management of massive amount audio documents. In this paper, we present work co-optimizing two main steps clustering, namely, feature learning and cluster estimation. our method, deep representation learned convolutional autoencoder network (DCAN), while estimation realized softmax layer that combined with DCAN. We devise integrated loss function...
The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application, database was...
Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy verification systems with satisfactory result low-resource terminals. We design a transformation module that performs feature partition and fusion implement lightweight verification. The consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, element-wise summation. It works in plug-and-play way,...
In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential from the input speech recordings. method, logarithm Mel spectrum is extracted each recording then fed to for learning embedding. Afterwards, learned embedding back-end classifier (such as cosine similarity metric) scoring in testing stage. compared with state-of-the-art methods...
Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy verification systems with satisfactory result low-resource terminals. We design a transformation module that performs feature partition and fusion implement lightweight verification. The consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, element-wise summation. It works in plug-and-play way,...