Iterative Compression of End-to-End ASR Model Using AutoML

Speedup Rank (graph theory)
DOI: 10.21437/interspeech.2020-1894 Publication Date: 2020-10-27T09:22:11Z
ABSTRACT
Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up 3.7x, outperforming laborious manual rank-selection approaches. However, we show current search techniques only work certain level, beyond which they fail produce compressed models with acceptable word error rates (WER). In this work, propose iterative LRF approach achieves over 5x without degrading the WER, thereby advancing state-of-the-art compression.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....