ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling
Utterance
Contextualization
Robustness
DOI:
10.18653/v1/2021.ecnlp-1.3
Publication Date:
2021-07-27T01:42:51Z
AUTHORS (3)
ABSTRACT
Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive cross utterance contextual cues play an important role disambiguating specific content words from speech. In this paper, we investigate various techniques to improve contextualization, word adaptation of a Transformer-XL neural language model (NLM) rescore ASR N-best hypotheses. To utilize turn level dialogue acts along context carry over. Additionally, adapt our domain-general NLM towards on-the-fly, use embeddings derived finetuned masked LM on in-domain data. Finally, words, propose multi-task can jointly perform detection modeling tasks. Compared non-contextual LSTM baseline, best performing rescorer results WER reduction 19.2% audio test set labeling F1 improvement 6.4%.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....