MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
Computation and Language (cs.CL)
01 natural sciences
Machine Learning (cs.LG)
0105 earth and related environmental sciences
DOI:
10.18653/v1/2020.acl-main.195
Publication Date:
2020-07-29T14:14:43Z
AUTHORS (6)
ABSTRACT
Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions parameters. However, these suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing accelerating the popular BERT model. Like original BERT, is task-agnostic, is, it can generically applied various downstream NLP tasks via simple fine-tuning. Basically, a thin version BERT_LARGE, while equipped bottleneck structures carefully designed balance between self-attentions feed-forward networks. To train MobileBERT, first specially teacher model, an inverted-bottleneck incorporated BERT_LARGE Then, conduct knowledge transfer MobileBERT. Empirical studies show 4.3x smaller 5.5x faster than BERT_BASE achieving competitive results on well-known benchmarks. On natural language inference GLUE, achieves GLUE score 77.7 (0.6 lower BERT_BASE), 62 ms Pixel 4 phone. SQuAD v1.1/v2.0 question answering task, dev F1 90.0/79.2 (1.5/2.1 higher BERT_BASE).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (254)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....