Corra: Correlation-Aware Column Compression
FOS: Computer and information sciences
Computer Science - Databases
Databases (cs.DB)
DOI:
10.48550/arxiv.2403.17229
Publication Date:
2024-03-25
AUTHORS (4)
ABSTRACT
Column encoding schemes have witnessed a spark of interest lately. This is not surprising -- as data volume increases, being able to keep one's dataset in main memory for fast processing coveted desideratum. However, it also seems that single-column reached plateau terms the compression size they can achieve. We argue this because do exploit correlations data. Consider instance column pair ($\texttt{city}$, $\texttt{zip-code}$) DMV dataset: city has only few dozen unique zip codes. Such information, if properly exploited, significantly reduce space consumption latter column. In work, we depart from established, well-trodden path compressing using and introduce $\textit{correlation-aware}$ schemes. demonstrate their advantages compared on well-known TPC-H's $\texttt{lineitem}$, LDBC's $\texttt{message}$, DMV, Taxi. For example, obtain saving rate 58.3% $\texttt{lineitem}$'s $\texttt{shipdate}$, while dropoff timestamps Taxi witness 30.6%.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....