Rover: An online Spark SQL tuning service via generalized transfer learning

SPARK (programming language) Bayesian Optimization Transfer of learning Performance tuning
DOI: 10.48550/arxiv.2302.04046 Publication Date: 2023-01-01
ABSTRACT
Distributed data analytic engines like Spark are common choices to process massive in industry. However, the performance of SQL highly depends on choice configurations, where optimal ones vary with executed workloads. Among various alternatives for tuning, Bayesian optimization (BO) is a popular framework that finds near-optimal configurations given sufficient budget, but it suffers from re-optimization issue and not practical real production. When applying transfer learning accelerate tuning process, we notice two domain-specific challenges: 1) most previous work focus transferring history, while expert knowledge engineers great potential improve well studied so far; 2) history tasks should be carefully utilized, using dissimilar lead deteriorated In this paper, present Rover, deployed online service efficient safe search industrial To address challenges, propose generalized boost based external knowledge, including expert-assisted controlled transfer. Experiments public benchmarks real-world show superiority Rover over competitive baselines. Notably, saves an average 50.1% memory cost 12k 20 iterations, among which 76.2% achieve significant reduction 60%.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....