Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Performance Improvement
DOI: 10.48550/arxiv.2210.06707 Publication Date: 2022-01-01
ABSTRACT
The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces computation consumption by low-bit parameters bit-wise operations. However, ViTs remain largely unexplored usually a significant drop compared with real-valued counterparts. In this work, through extensive empirical analysis, we first identify bottleneck for severe comes information distortion of quantized self-attention map. We then develop an rectification module (IRM) distribution guided distillation (DGD) scheme fully (Q-ViT) to effectively eliminate such distortion, leading ViTs. evaluate our methods popular DeiT Swin backbones. Extensive experimental results show that method achieves much better than prior arts. For example, Q-ViT can theoretically accelerates ViT-S 6.14x about 80.9% Top-1 accuracy, even surpassing full-precision counterpart 1.0% ImageNet dataset. Our codes models are attached https://github.com/YanjingLi0202/Q-ViT
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....