Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Representation
Language Understanding
Deep Neural Networks
Natural language understanding
DOI:
10.48550/arxiv.1903.12136
Publication Date:
2019-01-01
AUTHORS (6)
ABSTRACT
In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is deep representation model, which includes BERT, ELMo, GPT. These developments have led to conviction that previous-generation, shallower for understanding obsolete. paper, however, we demonstrate rudimentary, lightweight can still be made competitive without architecture changes, external training data, or additional input features. We propose distill knowledge from a state-of-the-art into single-layer BiLSTM, as well its siamese counterpart sentence-pair tasks. Across multiple datasets in paraphrasing, inference, sentiment classification, achieve comparable results with while using roughly 100 times fewer parameters 15 less inference time.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....