Vocal tract normalization in speech recognition: Compensating for systematic speaker variability

Vocal tract Normalization Spectrogram Resampling Speaker diarisation Image warping Dynamic Time Warping
DOI: 10.1121/1.411700 Publication Date: 2005-10-14T19:39:58Z
ABSTRACT
The performance of speech recognition systems is often improved by accounting explicitly for sources variability in the data. In SWITCHBOARD corpus, studied during 1994 CAIP workshop [Frontiers Speech Processing Workshop II, (August 1994)], an attempt was made to compensate systematic due different vocal tract lengths various speakers. method found a maximum probability parameter each speaker which mapped acoustic model mean models taken from homogeneous population. underlying that straight tube, and estimation accomplished warping spectrum linearly over 20% range (actually digitally resampling data), finding aposteriori data given warp. technique produces statistically significant improvements accuracy on transcription task using four systems. best parametrizations were later correlate well with estimates computed manually spectrograms.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (28)