Vocal tract normalization in speech recognition: Compensating for systematic speaker variability
Vocal tract
Normalization
Spectrogram
Resampling
Speaker diarisation
Image warping
Dynamic Time Warping
DOI:
10.1121/1.411700
Publication Date:
2005-10-14T19:39:58Z
AUTHORS (3)
ABSTRACT
The performance of speech recognition systems is often improved by accounting explicitly for sources variability in the data. In SWITCHBOARD corpus, studied during 1994 CAIP workshop [Frontiers Speech Processing Workshop II, (August 1994)], an attempt was made to compensate systematic due different vocal tract lengths various speakers. method found a maximum probability parameter each speaker which mapped acoustic model mean models taken from homogeneous population. underlying that straight tube, and estimation accomplished warping spectrum linearly over 20% range (actually digitally resampling data), finding aposteriori data given warp. technique produces statistically significant improvements accuracy on transcription task using four systems. best parametrizations were later correlate well with estimates computed manually spectrograms.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (28)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....