- Data Management and Algorithms
- Advanced Database Systems and Queries
- Data Quality and Management
- Data Stream Mining Techniques
- Cloud Computing and Resource Management
- Advanced Image and Video Retrieval Techniques
- Caching and Content Delivery
- Water Systems and Optimization
- Domain Adaptation and Few-Shot Learning
- Hydraulic and Pneumatic Systems
- Multimodal Machine Learning Applications
- Advanced Multi-Objective Optimization Algorithms
- Simulation Techniques and Applications
- Combustion and Detonation Processes
- Network Security and Intrusion Detection
- Machine Learning and Data Classification
- Software System Performance and Reliability
- Markov Chains and Monte Carlo Methods
- Structural Integrity and Reliability Analysis
- Stochastic processes and statistical mechanics
- Optimization and Search Problems
- Data-Driven Disease Surveillance
- Privacy-Preserving Technologies in Data
- Particle physics theoretical and experimental studies
Tsinghua University
2008-2023
Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they may not effectively capture correlation between multiple tables. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, focus on estimating cardinality, but estimate cost. Secondly, are either too heavy or hard...
Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). For example, traditional empirical optimization techniques (e.g., cost estimation, join order selection, knob tuning, index view selection) cannot meet the high-performance requirement for large-scale instances, various applications diversified users, especially on cloud. Fortunately, learning-based alleviate this problem. other optimize models (DB4AI). is hard to...
Although learning-based database optimization techniques have been studied from academia in recent years, they not widely deployed commercial systems. In this work, we build an autonomous framework and integrate our proposed into open-source system openGauss. We propose effective models to learned optimizers (including query rewrite, cost/cardinality estimation, join order selection physical operator selection) advisors self-monitoring, self-diagnosis, self-configuration, self-optimization)....
Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial open-source Nevertheless, samplings can only be summarize one or few columns, which fall short capturing joint data distribution over an arbitrary combination because oversimplification original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries multiple...
Query performance prediction is vital to many database tasks (e.g., monitoring and query scheduling). Existing methods focus on predicting the for a single but cannot effectively predict concurrent queries, because it rather hard capture correlations between different e.g., lock conflict buffer sharing. To address this problem, we propose system queries using graph embedding based model. best of our knowledge, first graph-embedding-based model queries. We encode features, where each vertex...
Materializing views is an important method to reduce redundant computations in DBMS, especially for processing large scale analytical queries. However, many existing methods still need DBAs manually generate materialized views, which are not scalable a number of database instances, on the cloud database. To address this problem, we propose automatic view generation judiciously selects "highly beneficial" subqueries views. there two challenges. (1) How estimate benefit using query? (2) select...
In this paper, we study the problem of using deep neural networks (DNNs) for estimating cardinality similarity queries. Intuitively, DNNs can capture distribution data points, and learn to predict number points that are similar one point (a search) or a set join). However, hungry; directly training DNN often results in poor performance. We propose two strategies improve accuracy reduce size data: query segmentation segmentation. Query divides into segments, trains network each segment,...
Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view system, AutoView. It analyzes workloads, estimates costs...
We demonstrate a self-driving system DBMind, which provides three autonomous capabilities in database, including self-monitoring, self-diagnosis and self-optimization. First, self-monitoring judiciously collects database metrics detects anomalies (e.g., slow queries IO contention), can profile status while only slightly affecting performance (<5%). Then, utilizes an LSTM model to analyze the root causes of automatically detect from pre-defined failure hierarchy. Next, self-optimization...
Data analysts in industries spend more than 80% of time on data cleaning and integration the whole process analytics due to errors inconsistencies. It calls for effective query processing techniques tolerate In this paper, we develop a distributed in-memory similarity-based system called Dima. Dima supports two core operations, i.e., similarity search join. extends SQL programming interface users easily invoke these operations their analysis jobs. To avoid expensive transformation...
Data analysts spend more than 80% of time on data cleaning and integration in the whole process analytics due to errors inconsistencies. Similarity-based query processing is an important way tolerate However, similarity-based rather costly traditional database cannot afford such expensive requirement. In this paper, we develop a distributed in-memory system called Dima. Dima supports four core similarity operations, i.e., selection, join, top- k selection join. extends SQL for users easily...
Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they capture correlation between multiple columns. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, only estimate cardinality, but cost. Secondly, convolutional neural network (CNN) with average pooling hard represent...
The dipping tube hydropneumatic tank is one of the most efficient equipments to prevent water hammer in distribution and long distance transmission pipe systems. Due its low costs easy maintain features, has many irreplaceable advantages, however it difficult determine correct size gas volume for real world engineering applications. This paper presents a robust method solve problems from theory application. Based on Method Characteristics (MOC) equations, this derives equations modeling...
An end-to-end data integration system requires human feedback in several phases, including collecting training for entity matching, debugging the resulting clusters, confirming transformations applied on these clusters standardization, and finally, reducing each cluster to a single, canonical representation (or "golden record"). The traditional wisdom is sequentially apply feedback, obtained by asking specific questions, within some budget phase. However, questions are highly correlated;...
PDF HTML XML Export Cite reminder AlphaQO: Robust Learned Query Optimizer DOI: 10.21655/ijsi.1673-7288.00275 Author: Affiliation: Clc Number: Fund Project: National Natural Science Foundation of China (61925205,61632016), Huawei, and TAL Article | Figures Metrics Reference Related Cited by Materials Comments Abstract:Recently, learned query optimizers typically driven deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art...
Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view It analyzes workloads, estimates costs benefits...
Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). It is challenging for empirical optimization techniques (e.g., configuration tuning, query optimization) to meet the high-performance requirement large-scale instances, various applications, diversified users. Learning-based alleviate this problem by exploring high-quality strategies reusing historical data/models. other optimize models (DB4AI). hard deploy in real...