LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Robustness Benchmark (surveying)
DOI: 10.1609/aaai.v35i14.17518 Publication Date: 2022-09-08T19:56:51Z
ABSTRACT
The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts memory and the consumption inference time, which makes it difficult to deploy them on edge devices. In this work, we propose knowledge distillation method LRC-BERT based contrastive learning fit output intermediate layer from angular distance aspect, is not considered by existing methods. Furthermore, introduce gradient perturbation-based training architecture phase increase robustness LRC-BERT, first attempt distillation. Additionally, order better capture distribution characteristics layer, design two-stage for total loss. Finally, verifying 8 datasets General Language Understanding Evaluation (GLUE) benchmark, performance proposed exceeds state-of-the-art methods, proves effectiveness our method.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (31)