Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR
FOS: Computer and information sciences
Sound (cs.SD)
03 medical and health sciences
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0305 other medical science
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.21437/interspeech.2020-2556
Publication Date:
2020-10-27T05:22:11Z
AUTHORS (6)
ABSTRACT
Transformer has shown impressive performance in automatic speech recognition.It uses an encoder-decoder structure with self-attention to learn the relationship between high-level representation of source inputs and embedding target outputs.In this paper, we propose a novel decoder that features self-and-mixed attention (SMAD) deep acoustic (DAS) improve Transformer-based LVCSR.Specifically, introduce mechanism multi-layer for multiple levels abstraction.We also design mixed learns alignment different abstraction its corresponding linguistic information simultaneously shared space.The ASR experiments on Aishell-1 show proposed achieves CERs 4.8% dev set 5.1% test set, which are best reported results task our knowledge.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (5)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....