NFDI4DS | UHH-SEMS - Publication Details

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Bit (key)

DOI: 10.48550/arxiv.2405.17233 Publication Date: 2024-05-27

Abstract Supplemental Material References Cited by

AUTHORS (7)

Haoyu Wang

Bei Liu

Hang Shao

Bo Xiao

Ke Zeng

Guanglu Wan

Yanmin Qian

ABSTRACT

Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies LLM quantization. Firstly, K-Means clustering based algorithm is proposed that allows dynamic generation centroids each column parameter matrix. Secondly, design an outlier-guided precision search strategy which can dynamically assign varying bit-widths columns. Finally, outlier reservation scheme developed retain some parameters their original float point precision, trade off boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 Yi demonstrate our achieve state-of-the-art results across bit settings, especially extremely Code will be released soon.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....