NFDI4DS | UHH-SEMS - Publication Details

An end-to-end learning-based cost estimator

OPENALEX - Publications

Ji Sun Guoliang Li

Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they may not effectively capture correlation between multiple tables. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, focus on estimating cardinality, but estimate cost. Secondly, are either too heavy or hard...

10.14778/3368289.3368296 article EN Proceedings of the VLDB Endowment 2019-11-01

Database Meets Artificial Intelligence: A Survey

OPENALEX - Publications

Xuanhe Zhou Chengliang Chai Guoliang Li Ji Sun

Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). For example, traditional empirical optimization techniques (e.g., cost estimation, join order selection, knob tuning, index view selection) cannot meet the high-performance requirement for large-scale instances, various applications diversified users, especially on cloud. Fortunately, learning-based alleviate this problem. other optimize models (DB4AI). is hard to...

10.1109/tkde.2020.2994641 article EN IEEE Transactions on Knowledge and Data Engineering 2020-05-16

openGauss

OPENALEX - Publications

Guoliang Li Xuanhe Zhou Ji Sun Yu Xiang Yue Han and 4 more

Although learning-based database optimization techniques have been studied from academia in recent years, they not widely deployed commercial systems. In this work, we build an autonomous framework and integrate our proposed into open-source system openGauss. We propose effective models to learned optimizers (including query rewrite, cost/cardinality estimation, join order selection physical operator selection) advisors self-monitoring, self-diagnosis, self-configuration, self-optimization)....

10.14778/3476311.3476380 article EN Proceedings of the VLDB Endowment 2021-07-01

Learned cardinality estimation

OPENALEX - Publications

Ji Sun Jintao Zhang Zhaoyan Sun Guoliang Li Nan Tang

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial open-source Nevertheless, samplings can only be summarize one or few columns, which fall short capturing joint data distribution over an arbitrary combination because oversimplification original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries multiple...

10.14778/3485450.3485459 article EN Proceedings of the VLDB Endowment 2021-09-01

Query performance prediction for concurrent queries using graph embedding

OPENALEX - Publications

Xuanhe Zhou Ji Sun Guoliang Li Jianhua Feng

Query performance prediction is vital to many database tasks (e.g., monitoring and query scheduling). Existing methods focus on predicting the for a single but cannot effectively predict concurrent queries, because it rather hard capture correlations between different e.g., lock conflict buffer sharing. To address this problem, we propose system queries using graph embedding based model. best of our knowledge, first graph-embedding-based model queries. We encode features, where each vertex...

10.14778/3397230.3397238 article EN Proceedings of the VLDB Endowment 2020-05-01

Automatic View Generation with Deep Learning and Reinforcement Learning

OPENALEX - Publications

Haitao Yuan Guoliang Li Ling Feng Ji Sun Yue Han

Materializing views is an important method to reduce redundant computations in DBMS, especially for processing large scale analytical queries. However, many existing methods still need DBAs manually generate materialized views, which are not scalable a number of database instances, on the cloud database. To address this problem, we propose automatic view generation judiciously selects "highly beneficial" subqueries views. there two challenges. (1) How estimate benefit using query? (2) select...

10.1109/icde48307.2020.00133 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

Learned Cardinality Estimation for Similarity Queries

OPENALEX - Publications

Ji Sun Guoliang Li Nan Tang

In this paper, we study the problem of using deep neural networks (DNNs) for estimating cardinality similarity queries. Intuitively, DNNs can capture distribution data points, and learn to predict number points that are similar one point (a search) or a set join). However, hungry; directly training DNN often results in poor performance. We propose two strategies improve accuracy reduce size data: query segmentation segmentation. Query divides into segments, trains network each segment,...

10.1145/3448016.3452790 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

An Autonomous Materialized View Management System with Deep Reinforcement Learning

OPENALEX - Publications

Han Yue Guoliang Li Haitao Yuan Ji Sun

Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view system, AutoView. It analyzes workloads, estimates costs...

10.1109/icde51399.2021.00217 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

DBMind

OPENALEX - Publications

Xuanhe Zhou Lianyuan Jin Ji Sun Xinyang Zhao Yu Xiang and 5 more

We demonstrate a self-driving system DBMind, which provides three autonomous capabilities in database, including self-monitoring, self-diagnosis and self-optimization. First, self-monitoring judiciously collects database metrics detects anomalies (e.g., slow queries IO contention), can profile status while only slightly affecting performance (<5%). Then, utilizes an LSTM model to analyze the root causes of automatically detect from pre-defined failure hierarchy. Next, self-optimization...

10.14778/3476311.3476334 article EN Proceedings of the VLDB Endowment 2021-07-01

Dima

OPENALEX - Publications

Ji Sun Zeyuan Shang Guoliang Li Dong Deng Zhifeng Bao

Data analysts in industries spend more than 80% of time on data cleaning and integration the whole process analytics due to errors inconsistencies. It calls for effective query processing techniques tolerate In this paper, we develop a distributed in-memory similarity-based system called Dima. Dima supports two core operations, i.e., similarity search join. extends SQL programming interface users easily invoke these operations their analysis jobs. To avoid expensive transformation...

10.14778/3137765.3137810 article EN Proceedings of the VLDB Endowment 2017-08-01

Balance-aware distributed string similarity-based query processing system

OPENALEX - Publications

Ji Sun Zeyuan Shang Guoliang Li Dong Deng Zhifeng Bao

Data analysts spend more than 80% of time on data cleaning and integration in the whole process analytics due to errors inconsistencies. Similarity-based query processing is an important way tolerate However, similarity-based rather costly traditional database cannot afford such expensive requirement. In this paper, we develop a distributed in-memory system called Dima. Dima supports four core similarity operations, i.e., selection, join, top- k selection join. extends SQL for users easily...

10.14778/3329772.3329774 article EN Proceedings of the VLDB Endowment 2019-05-01

An End-to-End Learning-based Cost Estimator

OPENALEX - Publications

Ji Sun Guoliang Li

Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they capture correlation between multiple columns. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, only estimate cardinality, but cost. Secondly, convolutional neural network (CNN) with average pooling hard represent...

10.48550/arxiv.1906.02560 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Universal Alignment Probability Revisited

OPENALEX - Publications

Zong‐Yang Shen Qianchuan Zhao Qing‐Shan Jia Ji Sun

10.1007/s10957-008-9498-8 article EN Journal of Optimization Theory and Applications 2008-12-18

Hydraulic Transient Prevention with Dipping Tube Hydropneumatic Tank

OPENALEX - Publications

Rong He Wang Zhi Xun Wang Feng Zhang Ji Sun Xiao Xue Wang and 2 more

The dipping tube hydropneumatic tank is one of the most efficient equipments to prevent water hammer in distribution and long distance transmission pipe systems. Due its low costs easy maintain features, has many irreplaceable advantages, however it difficult determine correct size gas volume for real world engineering applications. This paper presents a robust method solve problems from theory application. Based on Method Characteristics (MOC) equations, this derives equations modeling...

10.4028/www.scientific.net/amm.316-317.762 article EN Applied Mechanics and Materials 2013-04-01

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation

OPENALEX - Publications

Ji Sun Dong Deng Ihab F. Ilyas Guoliang Li Samuel Madden and 3 more

An end-to-end data integration system requires human feedback in several phases, including collecting training for entity matching, debugging the resulting clusters, confirming transformations applied on these clusters standardization, and finally, reducing each cluster to a single, canonical representation (or "golden record"). The traditional wisdom is sequentially apply feedback, obtained by asking specific questions, within some budget phase. However, questions are highly correlated;...

10.48550/arxiv.1906.06574 preprint EN other-oa arXiv (Cornell University) 2019-01-01

AlphaQO: Robust Learned Query Optimizer

OPENALEX - Publications

X. D. Yu Chengliang Chai Xinning Zhang Nan Tang Ji Sun and 1 more

PDF HTML XML Export Cite reminder AlphaQO: Robust Learned Query Optimizer DOI: 10.21655/ijsi.1673-7288.00275 Author: Affiliation: Clc Number: Fund Project: National Natural Science Foundation of China (61925205,61632016), Huawei, and TAL Article | Figures Metrics Reference Related Cited by Materials Comments Abstract:Recently, learned query optimizers typically driven deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art...

10.21655/ijsi.1673-7288.00275 article EN International Journal of Software and Informatics 2022-01-01

AutoView: An Autonomous Materialized View Management System with Encoder-Reducer

OPENALEX - Publications

Yue Han Guoliang Li Haitao Yuan Ji Sun

Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view It analyzes workloads, estimates costs benefits...

10.1109/tkde.2022.3163195 article EN IEEE Transactions on Knowledge and Data Engineering 2022-01-01

Database Meets Artificial Intelligence: A Survey (Extended Abstract)

OPENALEX - Publications

Xuanhe Zhou Chengliang Chai Guoliang Li Ji Sun

Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). It is challenging for empirical optimization techniques (e.g., configuration tuning, query optimization) to meet the high-performance requirement large-scale instances, various applications, diversified users. Learning-based alleviate this problem by exploring high-quality strategies reusing historical data/models. other optimize models (DB4AI). hard deploy in real...

10.1109/icde55515.2023.00377 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2023-04-01