Speech recognition system robust to noise and speaking styles
Mel-frequency cepstrum
DOI:
10.21437/interspeech.2004-726
Publication Date:
2021-08-24T07:06:48Z
AUTHORS (4)
ABSTRACT
It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution use multiple models, one model for each different condition. In this paper, we discuss parallel decoding-based that robust the noise type, SNR, speaker gender and speaking style. Our consists of two recognition channels based on MFCC Differential (DMFCC) features. Each channel has several models depending style, adapted fast adaptation. From channel, hypothesis selected its likelihood. The final result obtained combining hypotheses from channels. We evaluate performance our normal hyperarticulated test data contaminated types at SNR levels. Experiments demonstrate could achieve accuracy in excess 80% style 0 dB. For hyper-articulated data, improved about 10% over 45% compared without speech.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....