NFDI4DS | UHH-SEMS - Publication Details

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Representation Language Understanding Deep Neural Networks Natural language understanding

DOI: 10.48550/arxiv.1903.12136 Publication Date: 2019-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Raphael Tang

Yao Lu

Linqing Liu

Lili Mou

Olga Vechtomova

Jimmy Lin

ABSTRACT

In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is deep representation model, which includes BERT, ELMo, GPT. These developments have led to conviction that previous-generation, shallower for understanding obsolete. paper, however, we demonstrate rudimentary, lightweight can still be made competitive without architecture changes, external training data, or additional input features. We propose distill knowledge from a state-of-the-art into single-layer BiLSTM, as well its siamese counterpart sentence-pair tasks. Across multiple datasets in paraphrasing, inference, sentiment classification, achieve comparable results with while using roughly 100 times fewer parameters 15 less inference time.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....