NFDI4DS | UHH-SEMS - Publication Details

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2411.05882 Publication Date: 2024-01-01

Abstract Supplemental Material References Cited by

AUTHORS (3)

Nielsen, Jacob

Galke, Lukas

Schneider-Kamp, P...

ABSTRACT

Contemporary machine learning models, such as language models, are powerful, but come with immense resource requirements both at training and inference time. It has been shown that decoder-only language models can be trained to a competitive state with ternary weights (1.58 bits per weight), facilitating efficient inference. Here, we start our exploration with non-transformer model architectures, investigating 1.58-bit training for multi-layer perceptrons and graph neural networks. Then, we explore 1.58-bit training in other transformer-based language models, namely encoder-only and encoder-decoder models. Our results show that in all of these settings, 1.58-bit training is on par with or sometimes even better than the standard 32/16-bit models.<br/>10 pages, 2 tables, 6 figures<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....