DualTable: A Hybrid Storage Model for Update Optimization in Hive
Distributed data store
Storage model
DOI:
10.48550/arxiv.1404.6878
Publication Date:
2014-01-01
AUTHORS (9)
ABSTRACT
Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in Hadoop ecosystem. It successfully used many Internet companies shows its value for big processing traditional industries. However, enterprise systems as Smart Grid applications usually require complicated business logics involve manipulation operations like updates deletes. cannot offer sufficient support these while preserving high query performance. using Distributed File System (HDFS) storage implement efficiently on HBase suffers from poor performance even though it can faster manipulation.There a project based issue Hive-5317 to update operations, but has not been finished Hive's latest version. Since this ACID compliant extension adopts same format HDFS, problem solved. In paper, we propose hybrid model called DualTable, which combines efficient streaming reads of HDFS random write capability HBase. DualTable provides better preserves at time. Experiments TPC-H set real smart grid show that up 10 times than when executing delete operations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....