NFDI4DS | UHH-SEMS - Publication Details

Kai Zeng

ORCID: 0009-0005-5788-5668

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5032503782

Research Areas

Data Management and Algorithms
Advanced Database Systems and Queries
Data Stream Mining Techniques
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Semantic Web and Ontologies
Advanced Image and Video Retrieval Techniques
Scientific Computing and Data Management
Stochastic Gradient Optimization Techniques
Time Series Analysis and Forecasting
Graph Theory and Algorithms
Algorithms and Data Compression
Advanced Neural Network Applications
Traffic Prediction and Management Techniques
Recommender Systems and Techniques
Geographic Information Systems Studies
Distributed systems and fault tolerance
Bayesian Modeling and Causal Inference
Target Tracking and Data Fusion in Sensor Networks
Generative Adversarial Networks and Image Synthesis
IoT and Edge/Fog Computing
3D Shape Modeling and Analysis
Autonomous Vehicle Technology and Safety
Radar Systems and Signal Processing
Face recognition and analysis

Huawei Technologies (China)
2023-2025

Kunming University of Science and Technology
2021-2024

Kunming University
2024

Alibaba Group (China)
2019-2023

Alibaba Group (United States)
2020-2023

University of Science and Technology of China
2023

Shenzhen University
2023

Zhejiang A & F University
2006-2023

Huawei Technologies (United Kingdom)
2023

University of Electronic Science and Technology of China
2010-2022

A distributed graph engine for web scale RDF data

OPENALEX - Publications

Kai Zeng Jiacheng Yang Haixun Wang Bin Shao Zhongyuan Wang

Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale data effectively. Furthermore, many useful general purpose graph-based operations (e.g., random walk, reachability, community discovery) on are not supported, as most existing store index in particular ways relational tables or a bitmap matrix) maximize one operation data: SPARQL query processing. In this paper, we introduce Trinity. RDF, distributed, memory-based graph...

10.14778/2535570.2488333 article EN Proceedings of the VLDB Endowment 2013-02-01

Early accurate results for advanced analytics on MapReduce

OPENALEX - Publications

Nikolay Laptev Kai Zeng Carlo Zaniolo

Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported MapReduce-oriented systems although these intended 'big data'. Therefore, we proposed implemented a non-parametric extension Hadoop allows incremental arbitrary work-flows, along with reliable on-line estimates degree accuracy...

10.14778/2336664.2336675 article EN Proceedings of the VLDB Endowment 2012-06-01

The analytical bootstrap

OPENALEX - Publications

Kai Zeng Shi Gao Barzan Mozafari Carlo Zaniolo

Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area research that now made more critical by need for timely and cost-effective analytics over "Big Data". Assessing quality (i.e., estimating error) approximate answers essential meaningful AQP, two main approaches past to address this problem are based on either (i) analytic error quantification or (ii) bootstrap method. The first approach extremely efficient but lacks generality, whereas second...

10.1145/2588555.2588579 article EN 2014-06-18

G-OLA

OPENALEX - Publications

Kai Zeng Sameer Agarwal Ankur Dave Michael Armbrust Ion Stoica

Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of query by showing iteratively refined approximate answers, (2) stop execution once its result achieves desired accuracy. In this demonstration, we present G-OLA, novel mini-batch model generalizes OLA support general OLAP queries with arbitrarily nested aggregates using efficient delta maintenance techniques. We have implemented G-OLA in FluoDB,...

10.1145/2723372.2735381 article EN 2015-05-27

Cardinality estimation in DBMS

OPENALEX - Publications

Yuxing Han Zi‐Niu Wu Peizhi Wu Rong Zhu Jingyi Yang and 9 more

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for optimizer DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding accuracy and inference latency. However, there exists no study that systematically evaluates quality these answer fundamental problem: to what extent can improve performance real-world settings, which is ultimate goal method. this paper, we...

10.14778/3503585.3503586 article EN Proceedings of the VLDB Endowment 2021-12-01

FLAT

OPENALEX - Publications

Rong Zhu Zi‐Niu Wu Yuxing Han Kai Zeng Andreas Pfadler and 3 more

Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how model the rich joint distribution attributes in an and compact manner. Despite decades research, existing methods either over-simplify models only using independent factorization which leads inaccurate estimates, or over-complicate them by lossless conditional without any assumption results slow probability computation. In this paper, we propose FLAT, a...

10.14778/3461535.3461539 article EN Proceedings of the VLDB Endowment 2021-05-01

LEON: A New Framework for ML-Aided Query Optimization

OPENALEX - Publications

Xu Chen Haitian Chen Zibo Liang Shuncheng Liu Jinghong Wang and 3 more

Query optimization has long been a fundamental yet challenging topic in the database field. With prosperity of machine learning (ML), some recent works have shown advantages reinforcement (RL) based learned query optimizer. However, they suffer from limitations due to data-driven nature ML. Motivated by ML characteristics and maturity, we propose LEON -a framework for ML-aidEd OptimizatioN. improves expert optimizer self-adjust particular deployment leveraging knowledge To train model,...

10.14778/3598581.3598597 article EN Proceedings of the VLDB Endowment 2023-05-01

Deep Learning in Forest Tree Species Classification Using Sentinel-2 on Google Earth Engine: A Case Study of Qingyuan County

OPENALEX - Publications

Tao He Houkui Zhou Caiyao Xu Junguo Hu Xingyu Xue and 4 more

Forest tree species information plays an important role in ecology and forest management, deep learning has been used widely for remote sensing image classification recent years. However, using images is still a difficult task. Since there no benchmark dataset species, (FTSD) was built this paper to fill the gap based on Sentinel-2 images. The FTSD contained nine kinds of Qingyuan County with 8,815 images, each resolution 64 × pixels. were produced by combining management inventory data...

10.3390/su15032741 article EN Sustainability 2023-02-02

High-performance complex event processing over XML streams

OPENALEX - Publications

Barzan Mozafari Kai Zeng Carlo Zaniolo

Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range applications. However, many current CEP focus on efficiently data having simple structure, and otherwise limited their ability support continuous queries structured or semi-structured information. XML streams represent very popular form exchange, comprising large portions social network RSS feeds, financial records, configuration files, similar...

10.1145/2213836.2213866 article EN 2012-05-20

Chi

OPENALEX - Publications

Luo Mai Kai Zeng Rahul Potharaju Le Xu Steve Suh and 7 more

Stream-processing workloads and modern shared cluster environments exhibit high variability unpredictability. Combined with the large parameter space diverse set of user SLOs, this makes streaming systems very challenging to statically configure tune. To address these issues, in paper we investigate a novel control-plane design, Chi, which supports continuous monitoring feedback, enables dynamic re-configuration. Chi leverages key insight embedding messages data-plane channels achieve...

10.14778/3231751.3231765 article EN Proceedings of the VLDB Endowment 2018-06-01

Trustworthy Blockchain-Empowered Collaborative Edge Computing-as-a-Service Scheduling and Data Sharing in the IIoE

OPENALEX - Publications

Fenhua Bai Tao Shen Zhuo Yu Kai Zeng Bei Gong

Owing to the technology of 5G and beyond, collaborative edge computing-as-a-service has enabled trillions interconnected applications. It also become a prospective paradigm for providing computing services by offloading computationally intensive assignments mobile-edge servers or fog nodes due terminals constrained caching resources. Nevertheless, in this process, trust scheduling data sharing heterogeneous systems is an unavoidable challenge paramount importance. As powerful tool that...

10.1109/jiot.2021.3058125 article EN IEEE Internet of Things Journal 2021-02-10

Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve

OPENALEX - Publications

Xu Chen Shuncheng Liu Tong Yuan tao ye Kai Zeng and 2 more

In the realm of big data and cloud analytics, efficiently managing retrieving high-dimensional presents a critical challenge. Traditional indexes often struggle with storage overhead inherent in large datasets. There is growing interest adoption Small Materialize Aggregation (SMA) among database vendors due to its ability maintain lightweight block-level metadata, facilitating efficient block skipping. However, SMA performance relies heavily on layout. This especially scenarios wide tables...

10.1145/3709710 article EN Proceedings of the ACM on Management of Data 2025-02-10

Making sense of trajectory data: A partition-and-summarization approach

OPENALEX - Publications

Han Su Kai Zheng Kai Zeng Jiamin Huang Shazia Sadiq and 2 more

Due to the prevalence of GPS-enabled devices and wireless communication technology, spatial trajectories that describe movement history moving objects are being generated accumulated at an unprecedented pace. However, a raw trajectory in form sequence timestamped locations does not make much sense for humans without semantic representation. In this work we aim facilitate human's understanding by automatically generating short text it. By formulating task as problem adaptive segmentation...

10.1109/icde.2015.7113348 article EN 2015-04-01

iOLAP

OPENALEX - Publications

Kai Zeng Sameer Agarwal Ion Stoica

The size of data and the complexity analytics continue to grow along with need for timely cost-effective analysis. However, growth computation power cannot keep up data. This calls a paradigm shift from traditional batch OLAP processing model an incremental model. In this paper, we propose iOLAP, query engine that provides smooth trade-off between accuracy latency, fulfills full spectrum user requirements approximate but execution more accurate execution. iOLAP enables interactive using...

10.1145/2882903.2915240 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

BASE: Bridging the Gap between Cost and Latency for Query Optimization

OPENALEX - Publications

Xu Chen Zhen Wang Shuncheng Liu Yaliang Li Kai Zeng and 4 more

Some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizers. These often use cost (i.e., estimation model) or latency execution time) as guidance signals for training their models. However, cost-based underperforms in and latency-based is time-intensive. In order to bypass such a dilemma, researchers attempt transfer value network from domain domain. We recognize critical insights cost/latency-based training, prompting us reward function rather...

10.14778/3594512.3594525 article EN Proceedings of the VLDB Endowment 2023-04-01

ABS

OPENALEX - Publications

Kai Zeng Shi Gao Jiaqi Gu Barzan Mozafari Carlo Zaniolo

Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must accompanied by reliable estimates the quality of sample-produced approximate answers; two main techniques used in past this purpose are (i) closed-form analytic error estimation, (ii) bootstrap method. Approach extremely efficient but lacks generality, whereas general suffers from high computational overhead. Our recently...

10.1145/2588555.2594532 article EN 2014-06-18

FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters

OPENALEX - Publications

Wenqi Jiang Zhenhao He Shuai Zhang Kai Zeng Feng Liang and 6 more

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation memory to different types bridging their connections by high-speed network, gains best both worlds, can naturally scale out adding nodes cluster. Experiments on three production models up 114 GB show that outperforms optimized...

10.1145/3447548.3467139 article EN 2021-08-12

STMaker

OPENALEX - Publications

Han Su Kai Zheng Kai Zeng Jiamin Huang Xiaofang Zhou

Widely adoption of GPS-enabled devices generates large amounts trajectories every day. The raw trajectory data describes the movement history moving objects by a sequence < longitude, latitude, time-stamp > triples, which are nonintuitive for human to perceive prominent features trajectory, such as where and how object travels. In this demo, we present STMaker system help users make sense individual trajectories. Given can automatically extract significant semantic behavior summarize...

10.14778/2733004.2733065 article EN Proceedings of the VLDB Endowment 2014-08-01

Lane Change Scheduling for Autonomous Vehicle: A Prediction-and-Search Framework

OPENALEX - Publications

Shuncheng Liu Han Su Yan Zhao Kai Zeng Kai Zheng

Automation in road vehicles is an emerging technology that has developed rapidly over the last decade. There have been many inter-disciplinary challenges posed on existing transportation infrastructure by autonomous (AV). In this paper, we conduct algorithmic study when and how vehicle should change its lane, which a fundamental problem automation field root cause of most 'phantom' traffic jams. We propose prediction-and-search framework, called Cheetah (Change lane smart for vehicle), aims...

10.1145/3447548.3467072 article EN 2021-08-12

Very fast estimation for result and accuracy of big data analytics: The EARL system

OPENALEX - Publications

Nikolay Laptev Kai Zeng Carlo Zaniolo

Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets (a.k.a. `big data') can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported big systems (e.g., Hadoop). Therefore, we propose a nonparametric accuracy estimation method system to speedup analytics. Our framework is called EARL (Early Accurate Result Library) it works by predicting...

10.1109/icde.2013.6544928 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Graph queries in a next-generation Datalog system

OPENALEX - Publications

Alexander Shkapsky Kai Zeng Carlo Zaniolo

Recent theoretical advances have enabled the use of special monotonic aggregates in recursion. These make possible concise expression and efficient implementation a rich new set advanced applications. Among these applications, graph queries are particularly important because their pervasiveness data intensive application areas. In this demonstration, we present our Deductive Application Language (DeAL) System, first generation Database Systems that support applications could not be expressed...

10.14778/2536274.2536290 article EN Proceedings of the VLDB Endowment 2013-08-01

Spatial-Temporal Evolution Guided Change Detection Network for Remote Sensing Images

OPENALEX - Publications

Qingwang Wang Hong Zheng Jiangbo Huang Xiaobin Zhao Jian Song and 3 more

With the rapid advancement of remote sensing technology, bi-temporal change detection (CD) techniques have also seen significant progress. However, existing CD tasks still face two challenges: 1) Variations in lighting and seasonal factors complicate imaging conditions, causing pseudo-variation interference, 2) The spatial distribution shapes building are diverse, leading to difficulties extracting utilizing effective features. In this paper, we propose spatial-temporal evolution guided...

10.1109/jstars.2024.3439510 article EN cc-by-nc-nd IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2024-01-01

High-performance complex event processing over hierarchical data

OPENALEX - Publications

Barzan Mozafari Kai Zeng Loris D’Antoni Carlo Zaniolo

While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent very popular form exchange, comprising large portions social network RSS feeds, financial configuration files, similar applications requiring...

10.1145/2536779 article EN ACM Transactions on Database Systems 2013-11-01

From regular expressions to nested words

OPENALEX - Publications

Barzan Mozafari Kai Zeng Carlo Zaniolo

There is growing interest in query language extensions for pattern matching over event streams and stored database sequences, due to the many important applications that such make possible. The push has led DBMS vendors DSMS venture companies propose Kleene-closure of SQL standards, building on seminal research demonstrated effectiveness amenability efficient implementation constructs. These extensions, however powerful, suffer from limitations severely impair their real-world applications....

10.14778/1920841.1920865 article EN Proceedings of the VLDB Endowment 2010-09-01

REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries

OPENALEX - Publications

Bolong Zheng Lianggui Weng Xi Zhao Kai Zeng Xiaofang Zhou and 1 more

Trajectory similarity computation is a fundamental component in variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances mobile devices have enabled an unprecedented increase the amount available trajectory data that efficient query processing can no longer be supported by single machine. As result, means performing distributed in-memory search are called for. However, existing proposals either suffer from computing resource...

10.1109/icde51399.2021.00067 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Coming Soon ...