NFDI4DS | UHH-SEMS - Publication Details

Efficient Neural Compression with Inference-time Decoding

Neural decoding

DOI: 10.48550/arxiv.2406.06237 Publication Date: 2024-06-10

Abstract Supplemental Material References Cited by

AUTHORS (3)

Corona Metz

Olivier Bichler

Antoine Dupret

ABSTRACT

This paper explores the combination of neural network quantization and entropy coding for memory footprint minimization. Edge deployment quantized models is hampered by harsh Pareto frontier accuracy-to-bitwidth tradeoff, causing dramatic accuracy loss below a certain bitwidth. can be alleviated thanks to mixed precision quantization, allowing more flexible bitwidth allocation. However, standard benefits remain limited due 1-bit frontier, that forces each parameter encoded on at least 1 bit data. introduces an approach combines precision, zero-point push compression boundary Resnets beyond with drop 1% ImageNet benchmark. From implementation standpoint, compact decoder architecture features reduced latency, thus inference-compatible decoding.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Efficient Neural Compression with Inference-time Decoding

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....