Ji Sun

ORCID: 0000-0002-9782-7201
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Data Quality and Management
  • Data Stream Mining Techniques
  • Cloud Computing and Resource Management
  • Advanced Image and Video Retrieval Techniques
  • Caching and Content Delivery
  • Water Systems and Optimization
  • Domain Adaptation and Few-Shot Learning
  • Hydraulic and Pneumatic Systems
  • Multimodal Machine Learning Applications
  • Advanced Multi-Objective Optimization Algorithms
  • Simulation Techniques and Applications
  • Combustion and Detonation Processes
  • Network Security and Intrusion Detection
  • Machine Learning and Data Classification
  • Software System Performance and Reliability
  • Markov Chains and Monte Carlo Methods
  • Structural Integrity and Reliability Analysis
  • Stochastic processes and statistical mechanics
  • Optimization and Search Problems
  • Data-Driven Disease Surveillance
  • Privacy-Preserving Technologies in Data
  • Particle physics theoretical and experimental studies

Tsinghua University
2008-2023

Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they may not effectively capture correlation between multiple tables. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, focus on estimating cardinality, but estimate cost. Secondly, are either too heavy or hard...

10.14778/3368289.3368296 article EN Proceedings of the VLDB Endowment 2019-11-01

Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). For example, traditional empirical optimization techniques (e.g., cost estimation, join order selection, knob tuning, index view selection) cannot meet the high-performance requirement for large-scale instances, various applications diversified users, especially on cloud. Fortunately, learning-based alleviate this problem. other optimize models (DB4AI). is hard to...

10.1109/tkde.2020.2994641 article EN IEEE Transactions on Knowledge and Data Engineering 2020-05-16

Although learning-based database optimization techniques have been studied from academia in recent years, they not widely deployed commercial systems. In this work, we build an autonomous framework and integrate our proposed into open-source system openGauss. We propose effective models to learned optimizers (including query rewrite, cost/cardinality estimation, join order selection physical operator selection) advisors self-monitoring, self-diagnosis, self-configuration, self-optimization)....

10.14778/3476311.3476380 article EN Proceedings of the VLDB Endowment 2021-07-01

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial open-source Nevertheless, samplings can only be summarize one or few columns, which fall short capturing joint data distribution over an arbitrary combination because oversimplification original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries multiple...

10.14778/3485450.3485459 article EN Proceedings of the VLDB Endowment 2021-09-01

Query performance prediction is vital to many database tasks (e.g., monitoring and query scheduling). Existing methods focus on predicting the for a single but cannot effectively predict concurrent queries, because it rather hard capture correlations between different e.g., lock conflict buffer sharing. To address this problem, we propose system queries using graph embedding based model. best of our knowledge, first graph-embedding-based model queries. We encode features, where each vertex...

10.14778/3397230.3397238 article EN Proceedings of the VLDB Endowment 2020-05-01

Materializing views is an important method to reduce redundant computations in DBMS, especially for processing large scale analytical queries. However, many existing methods still need DBAs manually generate materialized views, which are not scalable a number of database instances, on the cloud database. To address this problem, we propose automatic view generation judiciously selects "highly beneficial" subqueries views. there two challenges. (1) How estimate benefit using query? (2) select...

10.1109/icde48307.2020.00133 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

In this paper, we study the problem of using deep neural networks (DNNs) for estimating cardinality similarity queries. Intuitively, DNNs can capture distribution data points, and learn to predict number points that are similar one point (a search) or a set join). However, hungry; directly training DNN often results in poor performance. We propose two strategies improve accuracy reduce size data: query segmentation segmentation. Query divides into segments, trains network each segment,...

10.1145/3448016.3452790 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view system, AutoView. It analyzes workloads, estimates costs...

10.1109/icde51399.2021.00217 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

We demonstrate a self-driving system DBMind, which provides three autonomous capabilities in database, including self-monitoring, self-diagnosis and self-optimization. First, self-monitoring judiciously collects database metrics detects anomalies (e.g., slow queries IO contention), can profile status while only slightly affecting performance (<5%). Then, utilizes an LSTM model to analyze the root causes of automatically detect from pre-defined failure hierarchy. Next, self-optimization...

10.14778/3476311.3476334 article EN Proceedings of the VLDB Endowment 2021-07-01

Data analysts in industries spend more than 80% of time on data cleaning and integration the whole process analytics due to errors inconsistencies. It calls for effective query processing techniques tolerate In this paper, we develop a distributed in-memory similarity-based system called Dima. Dima supports two core operations, i.e., similarity search join. extends SQL programming interface users easily invoke these operations their analysis jobs. To avoid expensive transformation...

10.14778/3137765.3137810 article EN Proceedings of the VLDB Endowment 2017-08-01

Data analysts spend more than 80% of time on data cleaning and integration in the whole process analytics due to errors inconsistencies. Similarity-based query processing is an important way tolerate However, similarity-based rather costly traditional database cannot afford such expensive requirement. In this paper, we develop a distributed in-memory system called Dima. Dima supports four core similarity operations, i.e., selection, join, top- k selection join. extends SQL for users easily...

10.14778/3329772.3329774 article EN Proceedings of the VLDB Endowment 2019-05-01

Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost techniques cannot provide high-quality estimation, because they capture correlation between multiple columns. Recently database community shows that learning-based better than methods. However, existing methods have several limitations. Firstly, only estimate cardinality, but cost. Secondly, convolutional neural network (CNN) with average pooling hard represent...

10.48550/arxiv.1906.02560 preprint EN other-oa arXiv (Cornell University) 2019-01-01

10.1007/s10957-008-9498-8 article EN Journal of Optimization Theory and Applications 2008-12-18

The dipping tube hydropneumatic tank is one of the most efficient equipments to prevent water hammer in distribution and long distance transmission pipe systems. Due its low costs easy maintain features, has many irreplaceable advantages, however it difficult determine correct size gas volume for real world engineering applications. This paper presents a robust method solve problems from theory application. Based on Method Characteristics (MOC) equations, this derives equations modeling...

10.4028/www.scientific.net/amm.316-317.762 article EN Applied Mechanics and Materials 2013-04-01

An end-to-end data integration system requires human feedback in several phases, including collecting training for entity matching, debugging the resulting clusters, confirming transformations applied on these clusters standardization, and finally, reducing each cluster to a single, canonical representation (or "golden record"). The traditional wisdom is sequentially apply feedback, obtained by asking specific questions, within some budget phase. However, questions are highly correlated;...

10.48550/arxiv.1906.06574 preprint EN other-oa arXiv (Cornell University) 2019-01-01

PDF HTML XML Export Cite reminder AlphaQO: Robust Learned Query Optimizer DOI: 10.21655/ijsi.1673-7288.00275 Author: Affiliation: Clc Number: Fund Project: National Natural Science Foundation of China (61925205,61632016), Huawei, and TAL Article | Figures Metrics Reference Related Cited by Materials Comments Abstract:Recently, learned query optimizers typically driven deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art...

10.21655/ijsi.1673-7288.00275 article EN International Journal of Software and Informatics 2022-01-01

Materialized views (MVs) can significantly optimize the query processing in databases. However, it is hard to generate MVs for ordinary users because relies on background knowledge, and existing methods rely DBAs maintain MVs. cannot handle large-scale databases, especially cloud databases that have millions of database instances support users. Thus calls an autonomous MV management system. In this paper, we propose materialized view It analyzes workloads, estimates costs benefits...

10.1109/tkde.2022.3163195 article EN IEEE Transactions on Knowledge and Data Engineering 2022-01-01

Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI make database more intelligent (AI4DB). It is challenging for empirical optimization techniques (e.g., configuration tuning, query optimization) to meet the high-performance requirement large-scale instances, various applications, diversified users. Learning-based alleviate this problem by exploring high-quality strategies reusing historical data/models. other optimize models (DB4AI). hard deploy in real...

10.1109/icde55515.2023.00377 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2023-04-01
Coming Soon ...