Computer-assisted pronunciation training—Speech synthesis is almost all you need
Pronunciation
DOI:
10.1016/j.specom.2022.06.003
Publication Date:
2022-06-22T15:19:59Z
AUTHORS (4)
ABSTRACT
The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, well the analysis of different representations speech signal. Despite significant progress recent years, existing CAPT are not able to detect errors with high accuracy (only 60\% precision at 40\%-80\% recall). One key problems is low availability mispronounced that needed for reliable error detection models. If we had a generative could mimic produce any amount data, then task detecting would be much easier. We present three innovative techniques based phoneme-to-phoneme (P2P), text-to-speech (T2S), speech-to-speech (S2S) conversion generate correctly pronounced synthetic show these only improve machine models but also help establish new state-of-the-art field. Earlier studies have used simple generation P2P conversion, an additional mechanism detection. We, other hand, consider first-class method errors. effectiveness assessed tasks lexical stress Non-native English corpora German, Italian, Polish speakers evaluations. best proposed S2S technique improves AUC metric by 41\% from 0.528 0.749 compared approach.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (79)
CITATIONS (24)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....