NFDI4DS | UHH-SEMS - Publication Details

Knowledge Distillation for Efficient Sequences of Training Runs

FOS: Computer and information sciences Computer Science - Machine Learning Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2303.06480 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Liu, Xingyu

Leonardi, Alex

Yu, Lu

Gilmer-Hill, Chris

Leavitt, Matthew

Frankle, Jonathan

ABSTRACT

In many practical scenarios -- like hyperparameter search or continual retraining with new data -- related training runs are performed many times in sequence. Current practice is to train each of these models independently from scratch. We study the problem of exploiting the computation invested in previous runs to reduce the cost of future runs using knowledge distillation (KD). We find that augmenting future runs with KD from previous runs dramatically reduces the time necessary to train these models, even taking into account the overhead of KD. We improve on these results with two strategies that reduce the overhead of KD by 80-90% with minimal effect on accuracy and vast pareto-improvements in overall cost. We conclude that KD is a promising avenue for reducing the cost of the expensive preparatory work that precedes training final models in practice.<br/>This paper was accepted by ICML 2022 First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Knowledge Distillation for Efficient Sequences of Training Runs

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....