An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
Speaker diarisation
DOI:
10.1016/j.csl.2023.101534
Publication Date:
2023-05-30T00:40:10Z
AUTHORS (6)
ABSTRACT
We performed an experimental review of current diarization systems for the conversational telephone speech (CTS) domain. In detail, we considered a total eight different algorithms belonging to clustering-based, end-to-end neural (EEND), and separation guided (SSGD) paradigms. studied inference-time computational requirements accuracy on four CTS datasets with characteristics languages. found that, among all methods considered, EEND-vector clustering (EEND-VC) offers best trade-off in terms computing performance. More general, EEND models have been be lighter faster inference compared clustering-based methods. However, they also require large amount diarization-oriented annotated data. particular EEND-VC performance our experiments degraded when dataset size was reduced, whereas self-attentive (SA-EEND) less affected. that SA-EEND gives consistent results EEND-VC, its degrading long conversations high sparsity. Clustering-based systems, VBx, instead more but are outperformed by EEND-VC. The gap respect this latter is reduced overlap-aware considered. SSGD most computationally demanding method, it could convenient if recognition has performed. Its close degrades significantly training data matched.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (111)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....