- Data Management and Algorithms
- Advanced Database Systems and Queries
- Data Stream Mining Techniques
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Semantic Web and Ontologies
- Advanced Image and Video Retrieval Techniques
- Scientific Computing and Data Management
- Stochastic Gradient Optimization Techniques
- Time Series Analysis and Forecasting
- Graph Theory and Algorithms
- Algorithms and Data Compression
- Advanced Neural Network Applications
- Traffic Prediction and Management Techniques
- Recommender Systems and Techniques
- Geographic Information Systems Studies
- Distributed systems and fault tolerance
- Bayesian Modeling and Causal Inference
- Target Tracking and Data Fusion in Sensor Networks
- Generative Adversarial Networks and Image Synthesis
- IoT and Edge/Fog Computing
- 3D Shape Modeling and Analysis
- Autonomous Vehicle Technology and Safety
- Radar Systems and Signal Processing
- Face recognition and analysis
Huawei Technologies (China)
2023-2025
Kunming University of Science and Technology
2021-2024
Kunming University
2024
Alibaba Group (China)
2019-2023
Alibaba Group (United States)
2020-2023
University of Science and Technology of China
2023
Shenzhen University
2023
Zhejiang A & F University
2006-2023
Huawei Technologies (United Kingdom)
2023
University of Electronic Science and Technology of China
2010-2022
Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale data effectively. Furthermore, many useful general purpose graph-based operations (e.g., random walk, reachability, community discovery) on are not supported, as most existing store index in particular ways relational tables or a bitmap matrix) maximize one operation data: SPARQL query processing. In this paper, we introduce Trinity. RDF, distributed, memory-based graph...
Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported MapReduce-oriented systems although these intended 'big data'. Therefore, we proposed implemented a non-parametric extension Hadoop allows incremental arbitrary work-flows, along with reliable on-line estimates degree accuracy...
Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area research that now made more critical by need for timely and cost-effective analytics over "Big Data". Assessing quality (i.e., estimating error) approximate answers essential meaningful AQP, two main approaches past to address this problem are based on either (i) analytic error quantification or (ii) bootstrap method. The first approach extremely efficient but lacks generality, whereas second...
Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of query by showing iteratively refined approximate answers, (2) stop execution once its result achieves desired accuracy. In this demonstration, we present G-OLA, novel mini-batch model generalizes OLA support general OLAP queries with arbitrarily nested aggregates using efficient delta maintenance techniques. We have implemented G-OLA in FluoDB,...
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for optimizer DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding accuracy and inference latency. However, there exists no study that systematically evaluates quality these answer fundamental problem: to what extent can improve performance real-world settings, which is ultimate goal method. this paper, we...
Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how model the rich joint distribution attributes in an and compact manner. Despite decades research, existing methods either over-simplify models only using independent factorization which leads inaccurate estimates, or over-complicate them by lossless conditional without any assumption results slow probability computation. In this paper, we propose FLAT, a...
Query optimization has long been a fundamental yet challenging topic in the database field. With prosperity of machine learning (ML), some recent works have shown advantages reinforcement (RL) based learned query optimizer. However, they suffer from limitations due to data-driven nature ML. Motivated by ML characteristics and maturity, we propose LEON -a framework for ML-aidEd OptimizatioN. improves expert optimizer self-adjust particular deployment leveraging knowledge To train model,...
Forest tree species information plays an important role in ecology and forest management, deep learning has been used widely for remote sensing image classification recent years. However, using images is still a difficult task. Since there no benchmark dataset species, (FTSD) was built this paper to fill the gap based on Sentinel-2 images. The FTSD contained nine kinds of Qingyuan County with 8,815 images, each resolution 64 × pixels. were produced by combining management inventory data...
Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range applications. However, many current CEP focus on efficiently data having simple structure, and otherwise limited their ability support continuous queries structured or semi-structured information. XML streams represent very popular form exchange, comprising large portions social network RSS feeds, financial records, configuration files, similar...
Stream-processing workloads and modern shared cluster environments exhibit high variability unpredictability. Combined with the large parameter space diverse set of user SLOs, this makes streaming systems very challenging to statically configure tune. To address these issues, in paper we investigate a novel control-plane design, Chi, which supports continuous monitoring feedback, enables dynamic re-configuration. Chi leverages key insight embedding messages data-plane channels achieve...
Owing to the technology of 5G and beyond, collaborative edge computing-as-a-service has enabled trillions interconnected applications. It also become a prospective paradigm for providing computing services by offloading computationally intensive assignments mobile-edge servers or fog nodes due terminals constrained caching resources. Nevertheless, in this process, trust scheduling data sharing heterogeneous systems is an unavoidable challenge paramount importance. As powerful tool that...
In the realm of big data and cloud analytics, efficiently managing retrieving high-dimensional presents a critical challenge. Traditional indexes often struggle with storage overhead inherent in large datasets. There is growing interest adoption Small Materialize Aggregation (SMA) among database vendors due to its ability maintain lightweight block-level metadata, facilitating efficient block skipping. However, SMA performance relies heavily on layout. This especially scenarios wide tables...
Due to the prevalence of GPS-enabled devices and wireless communication technology, spatial trajectories that describe movement history moving objects are being generated accumulated at an unprecedented pace. However, a raw trajectory in form sequence timestamped locations does not make much sense for humans without semantic representation. In this work we aim facilitate human's understanding by automatically generating short text it. By formulating task as problem adaptive segmentation...
The size of data and the complexity analytics continue to grow along with need for timely cost-effective analysis. However, growth computation power cannot keep up data. This calls a paradigm shift from traditional batch OLAP processing model an incremental model. In this paper, we propose iOLAP, query engine that provides smooth trade-off between accuracy latency, fulfills full spectrum user requirements approximate but execution more accurate execution. iOLAP enables interactive using...
Some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizers. These often use cost (i.e., estimation model) or latency execution time) as guidance signals for training their models. However, cost-based underperforms in and latency-based is time-intensive. In order to bypass such a dilemma, researchers attempt transfer value network from domain domain. We recognize critical insights cost/latency-based training, prompting us reward function rather...
Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must accompanied by reliable estimates the quality of sample-produced approximate answers; two main techniques used in past this purpose are (i) closed-form analytic error estimation, (ii) bootstrap method. Approach extremely efficient but lacks generality, whereas general suffers from high computational overhead. Our recently...
We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation memory to different types bridging their connections by high-speed network, gains best both worlds, can naturally scale out adding nodes cluster. Experiments on three production models up 114 GB show that outperforms optimized...
Widely adoption of GPS-enabled devices generates large amounts trajectories every day. The raw trajectory data describes the movement history moving objects by a sequence < longitude, latitude, time-stamp > triples, which are nonintuitive for human to perceive prominent features trajectory, such as where and how object travels. In this demo, we present STMaker system help users make sense individual trajectories. Given can automatically extract significant semantic behavior summarize...
Automation in road vehicles is an emerging technology that has developed rapidly over the last decade. There have been many inter-disciplinary challenges posed on existing transportation infrastructure by autonomous (AV). In this paper, we conduct algorithmic study when and how vehicle should change its lane, which a fundamental problem automation field root cause of most 'phantom' traffic jams. We propose prediction-and-search framework, called Cheetah (Change lane smart for vehicle), aims...
Approximate results based on samples often provide the only way in which advanced analytical applications very massive data sets (a.k.a. `big data') can satisfy their time and resource constraints. Unfortunately, methods tools for computation of accurate early are currently not supported big systems (e.g., Hadoop). Therefore, we propose a nonparametric accuracy estimation method system to speedup analytics. Our framework is called EARL (Early Accurate Result Library) it works by predicting...
Recent theoretical advances have enabled the use of special monotonic aggregates in recursion. These make possible concise expression and efficient implementation a rich new set advanced applications. Among these applications, graph queries are particularly important because their pervasiveness data intensive application areas. In this demonstration, we present our Deductive Application Language (DeAL) System, first generation Database Systems that support applications could not be expressed...
With the rapid advancement of remote sensing technology, bi-temporal change detection (CD) techniques have also seen significant progress. However, existing CD tasks still face two challenges: 1) Variations in lighting and seasonal factors complicate imaging conditions, causing pseudo-variation interference, 2) The spatial distribution shapes building are diverse, leading to difficulties extracting utilizing effective features. In this paper, we propose spatial-temporal evolution guided...
While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent very popular form exchange, comprising large portions social network RSS feeds, financial configuration files, similar applications requiring...
There is growing interest in query language extensions for pattern matching over event streams and stored database sequences, due to the many important applications that such make possible. The push has led DBMS vendors DSMS venture companies propose Kleene-closure of SQL standards, building on seminal research demonstrated effectiveness amenability efficient implementation constructs. These extensions, however powerful, suffer from limitations severely impair their real-world applications....
Trajectory similarity computation is a fundamental component in variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances mobile devices have enabled an unprecedented increase the amount available trajectory data that efficient query processing can no longer be supported by single machine. As result, means performing distributed in-memory search are called for. However, existing proposals either suffer from computing resource...