Distilling Object Detectors with Task Adaptive Regularization
Regularization
DOI:
10.48550/arxiv.2006.13108
Publication Date:
2020-01-01
AUTHORS (5)
ABSTRACT
Current state-of-the-art object detectors are at the expense of high computational costs and hard to deploy low-end devices. Knowledge distillation, which aims training a smaller student network by transferring knowledge from larger teacher model, is one promising solutions for model miniaturization. In this paper, we investigate each module typical detector in depth, propose general distillation framework that adaptively transfers according task specific priors. The intuition simply distilling all information not advisable, instead should only borrow priors where cannot perform well. Towards goal, region proposal sharing mechanism interflow responses between models. Based on this, transfer three levels, \emph{i.e.}, feature backbone, classification head, bounding box regression performs more reasonably. Furthermore, considering it would introduce optimization dilemma when minimizing loss detection simultaneously, decay strategy help improve generalization via gradually reducing penalty. Experiments widely used benchmarks demonstrate effectiveness our method. particular, using Faster R-CNN with FPN as an instantiation, achieve accuracy $39.0\%$ Resnet-50 COCO dataset, surpasses baseline $36.3\%$ $2.7\%$ points, even better than $38.5\%$ mAP.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....