Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Benchmark (surveying) Knowledge Transfer Transfer of learning
DOI: 10.18653/v1/2021.emnlp-main.526 Publication Date: 2021-12-17T03:56:42Z
ABSTRACT
Knowledge distillation (KD) is a common knowledge transfer algorithm used for model compression across variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher’s training data student network. However, privacy concerns, regulations and proprietary reasons may prevent such data. We present, best our knowledge, first work on Zero-shot Distillation NLP, where learns from much larger teacher without any task specific Our solution combines out-of-domain adversarial learn output distribution. investigate six tasks GLUE benchmark demonstrate that we can achieve between 75% 92% classification score (accuracy or F1) while compressing 30 times.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....