Zhongjie Jiang

ORCID: 0009-0009-7256-6592
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Telecommunications and Broadcasting Technologies
  • Advanced Adaptive Filtering Techniques
  • Infant Health and Development

South China University of Technology
2020-2023

Tencent (China)
2021

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed micro-phone arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application,...

10.1109/asru51503.2021.9688126 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

Speaker clustering is a task to merge speech segments uttered by the same speaker into single cluster, which an effective tool for alleviating management of massive amount audio documents. In this paper, we present work co-optimizing two main steps clustering, namely, feature learning and cluster estimation. our method, deep representation learned convolutional autoencoder network (DCAN), while estimation realized softmax layer that combined with DCAN. We devise integrated loss function...

10.1109/tmm.2020.3024667 article EN IEEE Transactions on Multimedia 2020-09-21

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application, database was...

10.48550/arxiv.2104.00960 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy verification systems with satisfactory result low-resource terminals. We design a transformation module that performs feature partition and fusion implement lightweight verification. The consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, element-wise summation. It works in plug-and-play way,...

10.1109/taslp.2023.3338533 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2023-12-05

In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential from the input speech recordings. method, logarithm Mel spectrum is extracted each recording then fed to for learning embedding. Afterwards, learned embedding back-end classifier (such as cosine similarity metric) scoring in testing stage. compared with state-of-the-art methods...

10.48550/arxiv.2306.00426 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy verification systems with satisfactory result low-resource terminals. We design a transformation module that performs feature partition and fusion implement lightweight verification. The consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, element-wise summation. It works in plug-and-play way,...

10.48550/arxiv.2312.03324 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01
Coming Soon ...