Kai Zeng

ORCID: 0009-0005-5788-5668
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Data Stream Mining Techniques
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Semantic Web and Ontologies
  • Advanced Image and Video Retrieval Techniques
  • Scientific Computing and Data Management
  • Stochastic Gradient Optimization Techniques
  • Time Series Analysis and Forecasting
  • Graph Theory and Algorithms
  • Algorithms and Data Compression
  • Advanced Neural Network Applications
  • Traffic Prediction and Management Techniques
  • Recommender Systems and Techniques
  • Geographic Information Systems Studies
  • Distributed systems and fault tolerance
  • Bayesian Modeling and Causal Inference
  • Target Tracking and Data Fusion in Sensor Networks
  • Generative Adversarial Networks and Image Synthesis
  • IoT and Edge/Fog Computing
  • 3D Shape Modeling and Analysis
  • Autonomous Vehicle Technology and Safety
  • Radar Systems and Signal Processing
  • Face recognition and analysis

Huawei Technologies (China)
2023-2025

Kunming University of Science and Technology
2021-2024

Kunming University
2024

Alibaba Group (China)
2019-2023

Alibaba Group (United States)
2020-2023

University of Science and Technology of China
2023

Shenzhen University
2023

Zhejiang A & F University
2006-2023

Huawei Technologies (United Kingdom)
2023

University of Electronic Science and Technology of China
2010-2022

Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale data effectively. Furthermore, many useful general purpose graph-based operations (e.g., random walk, reachability, community discovery) on are not supported, as most existing store index in particular ways relational tables or a bitmap matrix) maximize one operation data: SPARQL query processing. In this paper, we introduce Trinity. RDF, distributed, memory-based graph...

10.14778/2535570.2488333 article EN Proceedings of the VLDB Endowment 2013-02-01

Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported MapReduce-oriented systems although these intended 'big data'. Therefore, we proposed implemented a non-parametric extension Hadoop allows incremental arbitrary work-flows, along with reliable on-line estimates degree accuracy...

10.14778/2336664.2336675 article EN Proceedings of the VLDB Endowment 2012-06-01

Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area research that now made more critical by need for timely and cost-effective analytics over "Big Data". Assessing quality (i.e., estimating error) approximate answers essential meaningful AQP, two main approaches past to address this problem are based on either (i) analytic error quantification or (ii) bootstrap method. The first approach extremely efficient but lacks generality, whereas second...

10.1145/2588555.2588579 article EN 2014-06-18

Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of query by showing iteratively refined approximate answers, (2) stop execution once its result achieves desired accuracy. In this demonstration, we present G-OLA, novel mini-batch model generalizes OLA support general OLAP queries with arbitrarily nested aggregates using efficient delta maintenance techniques. We have implemented G-OLA in FluoDB,...

10.1145/2723372.2735381 article EN 2015-05-27

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for optimizer DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding accuracy and inference latency. However, there exists no study that systematically evaluates quality these answer fundamental problem: to what extent can improve performance real-world settings, which is ultimate goal method. this paper, we...

10.14778/3503585.3503586 article EN Proceedings of the VLDB Endowment 2021-12-01

Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how model the rich joint distribution attributes in an and compact manner. Despite decades research, existing methods either over-simplify models only using independent factorization which leads inaccurate estimates, or over-complicate them by lossless conditional without any assumption results slow probability computation. In this paper, we propose FLAT, a...

10.14778/3461535.3461539 article EN Proceedings of the VLDB Endowment 2021-05-01

Query optimization has long been a fundamental yet challenging topic in the database field. With prosperity of machine learning (ML), some recent works have shown advantages reinforcement (RL) based learned query optimizer. However, they suffer from limitations due to data-driven nature ML. Motivated by ML characteristics and maturity, we propose LEON -a framework for ML-aidEd OptimizatioN. improves expert optimizer self-adjust particular deployment leveraging knowledge To train model,...

10.14778/3598581.3598597 article EN Proceedings of the VLDB Endowment 2023-05-01

Forest tree species information plays an important role in ecology and forest management, deep learning has been used widely for remote sensing image classification recent years. However, using images is still a difficult task. Since there no benchmark dataset species, (FTSD) was built this paper to fill the gap based on Sentinel-2 images. The FTSD contained nine kinds of Qingyuan County with 8,815 images, each resolution 64 × pixels. were produced by combining management inventory data...

10.3390/su15032741 article EN Sustainability 2023-02-02

Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range applications. However, many current CEP focus on efficiently data having simple structure, and otherwise limited their ability support continuous queries structured or semi-structured information. XML streams represent very popular form exchange, comprising large portions social network RSS feeds, financial records, configuration files, similar...

10.1145/2213836.2213866 article EN 2012-05-20

Stream-processing workloads and modern shared cluster environments exhibit high variability unpredictability. Combined with the large parameter space diverse set of user SLOs, this makes streaming systems very challenging to statically configure tune. To address these issues, in paper we investigate a novel control-plane design, Chi, which supports continuous monitoring feedback, enables dynamic re-configuration. Chi leverages key insight embedding messages data-plane channels achieve...

10.14778/3231751.3231765 article EN Proceedings of the VLDB Endowment 2018-06-01

Owing to the technology of 5G and beyond, collaborative edge computing-as-a-service has enabled trillions interconnected applications. It also become a prospective paradigm for providing computing services by offloading computationally intensive assignments mobile-edge servers or fog nodes due terminals constrained caching resources. Nevertheless, in this process, trust scheduling data sharing heterogeneous systems is an unavoidable challenge paramount importance. As powerful tool that...

10.1109/jiot.2021.3058125 article EN IEEE Internet of Things Journal 2021-02-10

In the realm of big data and cloud analytics, efficiently managing retrieving high-dimensional presents a critical challenge. Traditional indexes often struggle with storage overhead inherent in large datasets. There is growing interest adoption Small Materialize Aggregation (SMA) among database vendors due to its ability maintain lightweight block-level metadata, facilitating efficient block skipping. However, SMA performance relies heavily on layout. This especially scenarios wide tables...

10.1145/3709710 article EN Proceedings of the ACM on Management of Data 2025-02-10

Due to the prevalence of GPS-enabled devices and wireless communication technology, spatial trajectories that describe movement history moving objects are being generated accumulated at an unprecedented pace. However, a raw trajectory in form sequence timestamped locations does not make much sense for humans without semantic representation. In this work we aim facilitate human's understanding by automatically generating short text it. By formulating task as problem adaptive segmentation...

10.1109/icde.2015.7113348 article EN 2015-04-01

The size of data and the complexity analytics continue to grow along with need for timely cost-effective analysis. However, growth computation power cannot keep up data. This calls a paradigm shift from traditional batch OLAP processing model an incremental model. In this paper, we propose iOLAP, query engine that provides smooth trade-off between accuracy latency, fulfills full spectrum user requirements approximate but execution more accurate execution. iOLAP enables interactive using...

10.1145/2882903.2915240 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

Some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizers. These often use cost (i.e., estimation model) or latency execution time) as guidance signals for training their models. However, cost-based underperforms in and latency-based is time-intensive. In order to bypass such a dilemma, researchers attempt transfer value network from domain domain. We recognize critical insights cost/latency-based training, prompting us reward function rather...

10.14778/3594512.3594525 article EN Proceedings of the VLDB Endowment 2023-04-01

Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must accompanied by reliable estimates the quality of sample-produced approximate answers; two main techniques used in past this purpose are (i) closed-form analytic error estimation, (ii) bootstrap method. Approach extremely efficient but lacks generality, whereas general suffers from high computational overhead. Our recently...

10.1145/2588555.2594532 article EN 2014-06-18

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation memory to different types bridging their connections by high-speed network, gains best both worlds, can naturally scale out adding nodes cluster. Experiments on three production models up 114 GB show that outperforms optimized...

10.1145/3447548.3467139 article EN 2021-08-12

Widely adoption of GPS-enabled devices generates large amounts trajectories every day. The raw trajectory data describes the movement history moving objects by a sequence < longitude, latitude, time-stamp > triples, which are nonintuitive for human to perceive prominent features trajectory, such as where and how object travels. In this demo, we present STMaker system help users make sense individual trajectories. Given can automatically extract significant semantic behavior summarize...

10.14778/2733004.2733065 article EN Proceedings of the VLDB Endowment 2014-08-01

Automation in road vehicles is an emerging technology that has developed rapidly over the last decade. There have been many inter-disciplinary challenges posed on existing transportation infrastructure by autonomous (AV). In this paper, we conduct algorithmic study when and how vehicle should change its lane, which a fundamental problem automation field root cause of most 'phantom' traffic jams. We propose prediction-and-search framework, called Cheetah (Change lane smart for vehicle), aims...

10.1145/3447548.3467072 article EN 2021-08-12

Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets (a.k.a. `big data') can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported big systems (e.g., Hadoop). Therefore, we propose a nonparametric accuracy estimation method system to speedup analytics. Our framework is called EARL (Early Accurate Result Library) it works by predicting...

10.1109/icde.2013.6544928 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Recent theoretical advances have enabled the use of special monotonic aggregates in recursion. These make possible concise expression and efficient implementation a rich new set advanced applications. Among these applications, graph queries are particularly important because their pervasiveness data intensive application areas. In this demonstration, we present our Deductive Application Language (DeAL) System, first generation Database Systems that support applications could not be expressed...

10.14778/2536274.2536290 article EN Proceedings of the VLDB Endowment 2013-08-01

With the rapid advancement of remote sensing technology, bi-temporal change detection (CD) techniques have also seen significant progress. However, existing CD tasks still face two challenges: 1) Variations in lighting and seasonal factors complicate imaging conditions, causing pseudo-variation interference, 2) The spatial distribution shapes building are diverse, leading to difficulties extracting utilizing effective features. In this paper, we propose spatial-temporal evolution guided...

10.1109/jstars.2024.3439510 article EN cc-by-nc-nd IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2024-01-01

While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent very popular form exchange, comprising large portions social network RSS feeds, financial configuration files, similar applications requiring...

10.1145/2536779 article EN ACM Transactions on Database Systems 2013-11-01

There is growing interest in query language extensions for pattern matching over event streams and stored database sequences, due to the many important applications that such make possible. The push has led DBMS vendors DSMS venture companies propose Kleene-closure of SQL standards, building on seminal research demonstrated effectiveness amenability efficient implementation constructs. These extensions, however powerful, suffer from limitations severely impair their real-world applications....

10.14778/1920841.1920865 article EN Proceedings of the VLDB Endowment 2010-09-01

Trajectory similarity computation is a fundamental component in variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances mobile devices have enabled an unprecedented increase the amount available trajectory data that efficient query processing can no longer be supported by single machine. As result, means performing distributed in-memory search are called for. However, existing proposals either suffer from computing resource...

10.1109/icde51399.2021.00067 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01
Coming Soon ...