Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration

JPEG
DOI: 10.48550/arxiv.2405.17146 Publication Date: 2024-05-27
ABSTRACT
This study investigates whether Compressed-Language Models (CLMs), i.e. language models operating on raw byte streams from Compressed File Formats~(CFFs), can understand files compressed by CFFs. We focus the JPEG format as a representative CFF, given its commonality and representativeness of key concepts in compression, such entropy coding run-length encoding. test if CLMs probing their capabilities to perform along three axes: recognition inherent file properties, handling with anomalies, generation new files. Our findings demonstrate that effectively these tasks. These results suggest semantics data when directly produced The possibility operate offers promise leverage some remarkable characteristics, ubiquity, compactness, multi-modality segment-nature.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....