Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs

Emphasis (telecommunications) Utterance Paralanguage
DOI: 10.21437/interspeech.2015-727 Publication Date: 2021-08-27T05:58:44Z
ABSTRACT
In speech, emphasis is an important type of paralinguistic information that helps convey the focus utterance, new information, and emotion. If can be incorporated into a speech-to-speech (S2S) translation system, it will possible to this across language barrier. However, previous related work focuses only on particular prosodic features, such as F0, or works with but extremely small vocabularies, 10 digits. paper, we describe S2S method able translate languages consider multiple features power, duration over larger vocabularies. We do so by introducing two components: word-level estimation using linear regression hidden semi-Markov models, translates target conditional random fields. The text-to-speech synthesis system also modified synthesize emphasized speech. result shows our correctly 91.6% F -measure for objective test, 87.8% subjective test.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (2)