NFDI4DS | UHH-SEMS - Publication Details

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR

FOS: Computer and information sciences Sound (cs.SD) 03 medical and health sciences Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 0305 other medical science Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing

DOI: 10.21437/interspeech.2020-2556 Publication Date: 2020-10-27T05:22:11Z

Abstract Supplemental Material References Cited by

AUTHORS (6)

Xinyuan Zhou

Grandee Lee

Emre Yılmaz

Yanhua Long

Jiaen Liang

Haizhou Li

ABSTRACT

Transformer has shown impressive performance in automatic speech recognition.It uses an encoder-decoder structure with self-attention to learn the relationship between high-level representation of source inputs and embedding target outputs.In this paper, we propose a novel decoder that features self-and-mixed attention (SMAD) deep acoustic (DAS) improve Transformer-based LVCSR.Specifically, introduce mechanism multi-layer for multiple levels abstraction.We also design mixed learns alignment different abstraction its corresponding linguistic information simultaneously shared space.The ASR experiments on Aishell-1 show proposed achieves CERs 4.8% dev set 5.1% test set, which are best reported results task our knowledge.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (5)

EXTERNAL LINKS

CROSSREF - Publications OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....