NFDI4DS | UHH-SEMS - Publication Details

Anqun Pan

ORCID: 0000-0002-6756-149X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5001037413

Research Areas

Distributed systems and fault tolerance
Advanced Data Storage Technologies
Advanced Database Systems and Queries
Algorithms and Data Compression
Parallel Computing and Optimization Techniques
Service-Oriented Architecture and Web Services
Data Management and Algorithms
Cloud Computing and Resource Management
Data Quality and Management
Network Packet Processing and Optimization
Topic Modeling
Age of Information Optimization
Software System Performance and Reliability
Security and Verification in Computing
Distributed and Parallel Computing Systems
Caching and Content Delivery
Semantic Web and Ontologies
Recommender Systems and Techniques
Advanced Graph Neural Networks
Image Retrieval and Classification Techniques
Biomedical Text Mining and Ontologies
Access Control and Trust
Web Data Mining and Analysis
Data Visualization and Analytics
Cognitive Functions and Memory

Tencent (China)
2018-2025

FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast Storage

OPENALEX - Publications

Ziyi Lu Qiang Cao Hong Jiang Yuxing Chen Jie Yao and 1 more

Our extensive experiments reveal that existing key-value stores (KVSs) achieve high performance at the expense of a huge memory footprint is often impractical or unacceptable. Even with emerging ultra-fast byte-addressable persistent (PM), KVSs fall far short delivering promised by PM's superior I/O bandwidth. To find root causes and bridge performance/memory-footprint gap, we revisit architectural features two representative indexing mechanisms (single-stage multi-stage) propose three-stage...

10.14778/3648160.3648177 article EN Proceedings of the VLDB Endowment 2024-02-01

A Generic Specification Framework for Weakly Consistent Replicated Data Types

OPENALEX - Publications

Xue Jiang Hengfeng Wei Yu Huang Yuxing Chen Anqun Pan

10.1109/tpds.2025.3533546 article EN IEEE Transactions on Parallel and Distributed Systems 2025-01-01

TxnSails: Achieving Serializable Transaction Scheduling with Self-Adaptive Isolation Level Selection

OPENALEX - Publications

Q. Zhuang Wei Lü Shuang Liu Yuxing Chen Xu Shi and 4 more

Achieving the serializable isolation level, regarded as gold standard for transaction processing, is costly. Recent studies reveal that adjusting specific query patterns within a workload can still achieve serializability even at lower levels. Nevertheless, these typically overlook trade-off between performance advantages of levels and overhead required to maintain serializability, potentially leading suboptimal level choices fail maximize performance. In this paper, we present TxnSails,...

10.48550/arxiv.2502.00991 preprint EN arXiv (Cornell University) 2025-02-02

SALI: A Scalable Adaptive Learned Index Framework based on Probability Models

OPENALEX - Publications

Jiake Ge Huanchen Zhang Boyu Shi Yuanhui Luo Yunda Guo and 3 more

The growth in data storage capacity and the increasing demands for high performance have created several challenges concurrent indexing structures. One promising solution is learned index, which uses a learning-based approach to fit distribution of stored predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing indexes exhibit constraints encounter issues scalability on multi-core storage. This paper introduces SALI, Scalable Adaptive...

10.1145/3626752 article EN Proceedings of the ACM on Management of Data 2023-12-08

Efficient Black-Box Checking of Snapshot Isolation in Databases

OPENALEX - Publications

Kaile Huang Si Liu Zhenge Chen Hengfeng Wei David Basin and 2 more

Snapshot isolation (SI) is a prevalent weak level that avoids the performance penalty imposed by serializability and simultaneously prevents various undesired data anomalies. Nevertheless, SI anomalies have recently been found in production cloud databases claim to provide guarantee. Given complex often unavailable internals of such databases, black-box checker highly desirable. In this paper we present PolySI, efficiently checks provides understandable counterexamples upon detecting...

10.14778/3583140.3583145 article EN Proceedings of the VLDB Endowment 2023-02-01

G-Learned Index: Enabling Efficient Learned Index on GPU

OPENALEX - Publications

Jiesong Liu Feng Zhang Lu Lv Qi Chang Xiaoguang Guo and 8 more

AI and GPU technologies have been widely applied to solve big data problems. The total volume worldwide reaches 200 zettabytes in 2022. How efficiently index the required content among massive becomes serious. Recently, a promising learned has proposed address this challenge: It extremely high efficiency while retaining marginal space overhead. However, we notice that previous indexes mainly focused on CPU architecture, ignoring advantages of GPU. Because traditional like B-Tree, LSM, bitmap...

10.1109/tpds.2024.3381214 article EN IEEE Transactions on Parallel and Distributed Systems 2024-04-02

Context-Aware Semantic Type Identification for Relational Attributes

OPENALEX - Publications

Yue Ding Yuhe Guo Wei Lü Haixiang Li Meihui Zhang and 3 more

10.1007/s11390-021-1048-y article EN Journal of Computer Science and Technology 2023-07-01

A lightweight and efficient temporal database management system in TDSQL

OPENALEX - Publications

Wei Lü Zhanhao Zhao Xiaoyu Wang Haixiang Li Zhenmiao Zhang and 4 more

Driven by the recent adoption of temporal expressions into SQL:2011, extensions support in conventional database management systems (a.b.a. DBMSs) have re-emerged as a research hotspot. In this paper, we present lightweight yet efficient built-in implementation Tencent's distributed system, namely TDSQL. The novelty TDSQL's includes: (1) new data model with extension (2) various optimizations, which are also applicable to other DBMSs, and (3) low-storage-consumption only changes maintained....

10.14778/3352063.3352122 article EN Proceedings of the VLDB Endowment 2019-08-01

Data-Aware Adaptive Compression for Stream Processing

OPENALEX - Publications

Yu Zhang Feng Zhang Hourun Li Shuhao Zhang Xiaoguang Guo and 3 more

Stream processing has been in widespread use, and one of the most common application scenarios is SQL query on streams. By 2021, global deployment IoT endpoints reached 12.3 billion, indicating a surge data generation. However, escalating demands for high throughput low latency stream systems have posed significant challenges due to increasing volume evolving user requirements. We present compression-based engine, called CompressStreamDB, which enables adaptive fine-grained directly...

10.1109/tkde.2024.3377710 article EN IEEE Transactions on Knowledge and Data Engineering 2024-03-19

T-SQL: A Lightweight Implementation to Enable Built-in Temporal Support in MVCC-based RDBMSs

OPENALEX - Publications

Zhanhao Zhao Wei Lü Hongyao Zhao Zongyan He Haixiang Li and 2 more

The adoption of temporal expressions into SQL:2011 has continuously driven the extensions support in relational database systems (a.b.a. RDBMSs). In this paper, we present T-SQL, a lightweight yet efficient built-in implementation RDBMSs. T-SQL completely relies on multi-version concurrency control (MVCC) which is widely adopted RDMBSs to manage data. For data, current records are maintained legacy databases, and historical records, i.e., previoius versions (if any), used be periodically...

10.1109/tkde.2021.3081717 article EN IEEE Transactions on Knowledge and Data Engineering 2021-01-01

Compressed Data Direct Computing for Databases

OPENALEX - Publications

Weitao Wan Feng Zhang Chenyang Zhang Mingde Zhang Jidong Zhai and 7 more

Directly performing operations on compressed data has been proven to be a big success facing Big Data problems in modern management systems. These systems have demonstrated significant compression benefits and performance improvement for analytics applications. However, current only focus queries, while complete system must support both query manipulation. To solve this problem, we develop CompressDB, which is new storage engine that can processing databases without decompression. CompressDB...

10.1109/tkde.2023.3316274 article EN IEEE Transactions on Knowledge and Data Engineering 2023-09-18

Homomorphic Compression: Making Text Processing on Compression Unlimited

OPENALEX - Publications

Jiawei Guan Feng Zhang Siqi Ma Kuangyu Chen Yihua Hu and 3 more

Lossless data compression is an effective way to handle the huge transmission and storage overhead of massive text data. Its utility even more significant today when volumes are skyrocketing. The concept operating on compressed infuses new blood into efficient management by enabling mainly access-oriented processing tasks be done directly without decompression. Facing limitations existing schemes such as limited types operations supported, low efficiency, high space occupation, we address...

10.1145/3626765 article EN Proceedings of the ACM on Management of Data 2023-12-08

RCBench: an RDMA-enabled transaction framework for analyzing concurrency control algorithms

OPENALEX - Publications

Hongyao Zhao Jingyao Li Wei Lü Qian Zhang Wanqing Yang and 5 more

10.1007/s00778-023-00821-0 article EN The VLDB Journal 2023-12-14

MSQL+

OPENALEX - Publications

Wei Lü Xinyi Zhang Zhiyu Shui Zhe Peng Xiao Zhang and 5 more

Similarity search is a primitive operation in various database applications. Thus far, large number of access methods have been proposed to accelerate the similarity query processing. Nonetheless, these mostly focus on developing standalone systems by proposing new indices. Given fact that existing RDBMS merely support traditional indices, it great necessity and practical importance develop standard built-in index based approach speeding up In this demonstration, we introduce MSQL+, plugin...

10.14778/3229863.3236237 article EN Proceedings of the VLDB Endowment 2018-08-01

Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)

OPENALEX - Publications

Qiushi Zheng Zhanhao Zhao Wei Lü Chang Yao Yuxing Chen and 2 more

Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas entire database. However, migration-based methods might cause blocked due waiting for data migration, while can become bottleneck. In this paper, we present Lion,...

10.48550/arxiv.2403.11221 preprint EN arXiv (Cornell University) 2024-03-17

Combining Optimal Transport and Embedding-Based Approaches for More Expressiveness in Unsupervised Graph Alignment

OPENALEX - Publications

Songyang Chen Yu Liu Lei Zou Zexuan Wang Youfang Lin and 2 more

Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting structure and features. One category existing works first computes representation then matches nodes with close embeddings, which is intuitive but lacks clear objective tailored for in unsupervised setting. The other reduces problem to optimal transport (OT) via Gromov-Wasserstein (GW) learning well-defined leaves large room exploring design cost. We propose principled...

10.48550/arxiv.2406.13216 preprint EN arXiv (Cornell University) 2024-06-19

Lion: Minimizing Distributed Transactions Through Adaptive Replica Provision

OPENALEX - Publications

Qiushi Zheng Zhanhao Zhao Wei Lu Chang Yao Yuxing Chen and 2 more

10.1109/icde60146.2024.00161 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

IndeXY: A Framework for Constructing Indexes Larger than Memory

OPENALEX - Publications

Zhong Chen Qingqing Zhou Yuxing Chen Xingsheng Zhao Kuang He and 2 more

10.1109/icde60146.2024.00046 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

TDSQL: Tencent Distributed Database System

OPENALEX - Publications

Yuxing Chen Anqun Pan Hailin Lei Anda Ye Shuo Han and 5 more

Distributed databases have become indispensable in contemporary computing and data processing, owing to their pivotal role ensuring high availability scalability. They effectively cater the requirements of management high-concurrency access. However, developing a distributed database system that is well-suited for diverse application scenarios, particularly large-scale applications, presents several challenges. These challenges include consistency achieving levels performance. This paper...

10.14778/3685800.3685812 article EN Proceedings of the VLDB Endowment 2024-08-01

GeoTP: Latency-aware Geo-Distributed Transaction Processing in Database Middlewares (Extended Version)

OPENALEX - Publications

Q. Zhuang Xu Shi Shuang Liu Wei Lü Zhanhao Zhao and 4 more

The widespread adoption of database middleware for supporting distributed transaction processing is prevalent in numerous applications, with heterogeneous data sources deployed across national and international boundaries. However, performance significantly drops due to the high network latency between long lock contention span, where transactions may be blocked while waiting locks held by concurrent transactions. In this paper, we propose GeoTP, a latency-aware geo-distributed approach...

10.48550/arxiv.2412.01213 preprint EN arXiv (Cornell University) 2024-12-02

Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDs

OPENALEX - Publications

Shucheng Wang Qiang Cao Hong Jiang Ziyi Lu Jie Yao and 2 more

Following a conventional design principle that pays more fast-CPU-cycles for fewer slow-I/Os, popular software storage architecture Linux Multiple-Disk (MD) parity-based RAID (e.g., RAID5 and RAID6) assigns one or centralized worker threads to efficiently process all user requests based on multi-stage asynchronous control global data structures, successfully exploiting characteristics of slow devices, e.g., Hard Disk Drives (HDDs). However, we observe that, with high-performance NVMe-based...

10.1145/3627992 article EN ACM Transactions on Storage 2023-10-16

Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and Improvement

OPENALEX - Publications

Qian Ge Yu Liu Yinghao Zhao Yuetian Sun Lei Zou and 2 more

SimRank-based similarity joins, which mainly include threshold-based and top- k are important types of all-pair SimRank queries. Although a line related algorithms have been proposed recently, they still fall short providing approximation guarantee suffer from scalability issues on medium large graphs. Meanwhile, we also lack an extensive analysis existing techniques in terms accuracy efficiency. Motivated by these challenges, first conduct detailed state-of-the-art provide additional...

10.14778/3636218.3636219 article EN Proceedings of the VLDB Endowment 2023-12-01

Efficient time-interval data extraction in MVCC-based RDBMS

OPENALEX - Publications

Haixiang Li Zhanhao Zhao Yijian Cheng Wei Lü Xiaoyong Du and 1 more

10.1007/s11280-018-0552-7 article EN World Wide Web 2018-04-11

Coming Soon ...