NFDI4DS | UHH-SEMS - Publication Details

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Overfitting Pruning Benchmark (surveying)

DOI: 10.48550/arxiv.2110.08190 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (9)

Shaoyi Huang

Dongkuan Xu

Ian En-Hsu Yen

Sung-En Chang

Bingbing Li

Shiyang Chen

Mimi Xie

Hang Liu

Caiwen Ding

ABSTRACT

Conventional wisdom in pruning Transformer-based language models is that reduces the model expressiveness and thus more likely to underfit rather than overfit. However, under trending pretrain-and-finetune paradigm, we postulate a counter-traditional hypothesis, is: increases risk of overfitting when performed at fine-tuning phase. In this paper, aim address problem improve performance via progressive knowledge distillation with error-bound properties. We show for first time reducing can help effectiveness paradigm. Ablation studies experiments on GLUE benchmark our method outperforms leading competitors across different tasks.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....