LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Robustness
Benchmark (surveying)
DOI:
10.1609/aaai.v35i14.17518
Publication Date:
2022-09-08T19:56:51Z
AUTHORS (7)
ABSTRACT
The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts memory and the consumption inference time, which makes it difficult to deploy them on edge devices. In this work, we propose knowledge distillation method LRC-BERT based contrastive learning fit output intermediate layer from angular distance aspect, is not considered by existing methods. Furthermore, introduce gradient perturbation-based training architecture phase increase robustness LRC-BERT, first attempt distillation. Additionally, order better capture distribution characteristics layer, design two-stage for total loss. Finally, verifying 8 datasets General Language Understanding Evaluation (GLUE) benchmark, performance proposed exceeds state-of-the-art methods, proves effectiveness our method.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (31)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....