Anqun Pan

ORCID: 0000-0002-6756-149X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Advanced Database Systems and Queries
  • Algorithms and Data Compression
  • Parallel Computing and Optimization Techniques
  • Service-Oriented Architecture and Web Services
  • Data Management and Algorithms
  • Cloud Computing and Resource Management
  • Data Quality and Management
  • Network Packet Processing and Optimization
  • Topic Modeling
  • Age of Information Optimization
  • Software System Performance and Reliability
  • Security and Verification in Computing
  • Distributed and Parallel Computing Systems
  • Caching and Content Delivery
  • Semantic Web and Ontologies
  • Recommender Systems and Techniques
  • Advanced Graph Neural Networks
  • Image Retrieval and Classification Techniques
  • Biomedical Text Mining and Ontologies
  • Access Control and Trust
  • Web Data Mining and Analysis
  • Data Visualization and Analytics
  • Cognitive Functions and Memory

Tencent (China)
2018-2025

Our extensive experiments reveal that existing key-value stores (KVSs) achieve high performance at the expense of a huge memory footprint is often impractical or unacceptable. Even with emerging ultra-fast byte-addressable persistent (PM), KVSs fall far short delivering promised by PM's superior I/O bandwidth. To find root causes and bridge performance/memory-footprint gap, we revisit architectural features two representative indexing mechanisms (single-stage multi-stage) propose three-stage...

10.14778/3648160.3648177 article EN Proceedings of the VLDB Endowment 2024-02-01

10.1109/tpds.2025.3533546 article EN IEEE Transactions on Parallel and Distributed Systems 2025-01-01

Achieving the serializable isolation level, regarded as gold standard for transaction processing, is costly. Recent studies reveal that adjusting specific query patterns within a workload can still achieve serializability even at lower levels. Nevertheless, these typically overlook trade-off between performance advantages of levels and overhead required to maintain serializability, potentially leading suboptimal level choices fail maximize performance. In this paper, we present TxnSails,...

10.48550/arxiv.2502.00991 preprint EN arXiv (Cornell University) 2025-02-02

The growth in data storage capacity and the increasing demands for high performance have created several challenges concurrent indexing structures. One promising solution is learned index, which uses a learning-based approach to fit distribution of stored predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing indexes exhibit constraints encounter issues scalability on multi-core storage. This paper introduces SALI, Scalable Adaptive...

10.1145/3626752 article EN Proceedings of the ACM on Management of Data 2023-12-08

Snapshot isolation (SI) is a prevalent weak level that avoids the performance penalty imposed by serializability and simultaneously prevents various undesired data anomalies. Nevertheless, SI anomalies have recently been found in production cloud databases claim to provide guarantee. Given complex often unavailable internals of such databases, black-box checker highly desirable. In this paper we present PolySI, efficiently checks provides understandable counterexamples upon detecting...

10.14778/3583140.3583145 article EN Proceedings of the VLDB Endowment 2023-02-01

AI and GPU technologies have been widely applied to solve big data problems. The total volume worldwide reaches 200 zettabytes in 2022. How efficiently index the required content among massive becomes serious. Recently, a promising learned has proposed address this challenge: It extremely high efficiency while retaining marginal space overhead. However, we notice that previous indexes mainly focused on CPU architecture, ignoring advantages of GPU. Because traditional like B-Tree, LSM, bitmap...

10.1109/tpds.2024.3381214 article EN IEEE Transactions on Parallel and Distributed Systems 2024-04-02

Driven by the recent adoption of temporal expressions into SQL:2011, extensions support in conventional database management systems (a.b.a. DBMSs) have re-emerged as a research hotspot. In this paper, we present lightweight yet efficient built-in implementation Tencent's distributed system, namely TDSQL. The novelty TDSQL's includes: (1) new data model with extension (2) various optimizations, which are also applicable to other DBMSs, and (3) low-storage-consumption only changes maintained....

10.14778/3352063.3352122 article EN Proceedings of the VLDB Endowment 2019-08-01

Stream processing has been in widespread use, and one of the most common application scenarios is SQL query on streams. By 2021, global deployment IoT endpoints reached 12.3 billion, indicating a surge data generation. However, escalating demands for high throughput low latency stream systems have posed significant challenges due to increasing volume evolving user requirements. We present compression-based engine, called CompressStreamDB, which enables adaptive fine-grained directly...

10.1109/tkde.2024.3377710 article EN IEEE Transactions on Knowledge and Data Engineering 2024-03-19

The adoption of temporal expressions into SQL:2011 has continuously driven the extensions support in relational database systems (a.b.a. RDBMSs). In this paper, we present T-SQL, a lightweight yet efficient built-in implementation RDBMSs. T-SQL completely relies on multi-version concurrency control (MVCC) which is widely adopted RDMBSs to manage data. For data, current records are maintained legacy databases, and historical records, i.e., previoius versions (if any), used be periodically...

10.1109/tkde.2021.3081717 article EN IEEE Transactions on Knowledge and Data Engineering 2021-01-01

Directly performing operations on compressed data has been proven to be a big success facing Big Data problems in modern management systems. These systems have demonstrated significant compression benefits and performance improvement for analytics applications. However, current only focus queries, while complete system must support both query manipulation. To solve this problem, we develop CompressDB, which is new storage engine that can processing databases without decompression. CompressDB...

10.1109/tkde.2023.3316274 article EN IEEE Transactions on Knowledge and Data Engineering 2023-09-18

Lossless data compression is an effective way to handle the huge transmission and storage overhead of massive text data. Its utility even more significant today when volumes are skyrocketing. The concept operating on compressed infuses new blood into efficient management by enabling mainly access-oriented processing tasks be done directly without decompression. Facing limitations existing schemes such as limited types operations supported, low efficiency, high space occupation, we address...

10.1145/3626765 article EN Proceedings of the ACM on Management of Data 2023-12-08

Similarity search is a primitive operation in various database applications. Thus far, large number of access methods have been proposed to accelerate the similarity query processing. Nonetheless, these mostly focus on developing standalone systems by proposing new indices. Given fact that existing RDBMS merely support traditional indices, it great necessity and practical importance develop standard built-in index based approach speeding up In this demonstration, we introduce MSQL+, plugin...

10.14778/3229863.3236237 article EN Proceedings of the VLDB Endowment 2018-08-01

Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas entire database. However, migration-based methods might cause blocked due waiting for data migration, while can become bottleneck. In this paper, we present Lion,...

10.48550/arxiv.2403.11221 preprint EN arXiv (Cornell University) 2024-03-17

Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting structure and features. One category existing works first computes representation then matches nodes with close embeddings, which is intuitive but lacks clear objective tailored for in unsupervised setting. The other reduces problem to optimal transport (OT) via Gromov-Wasserstein (GW) learning well-defined leaves large room exploring design cost. We propose principled...

10.48550/arxiv.2406.13216 preprint EN arXiv (Cornell University) 2024-06-19

10.1109/icde60146.2024.00161 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

10.1109/icde60146.2024.00046 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

Distributed databases have become indispensable in contemporary computing and data processing, owing to their pivotal role ensuring high availability scalability. They effectively cater the requirements of management high-concurrency access. However, developing a distributed database system that is well-suited for diverse application scenarios, particularly large-scale applications, presents several challenges. These challenges include consistency achieving levels performance. This paper...

10.14778/3685800.3685812 article EN Proceedings of the VLDB Endowment 2024-08-01

The widespread adoption of database middleware for supporting distributed transaction processing is prevalent in numerous applications, with heterogeneous data sources deployed across national and international boundaries. However, performance significantly drops due to the high network latency between long lock contention span, where transactions may be blocked while waiting locks held by concurrent transactions. In this paper, we propose GeoTP, a latency-aware geo-distributed approach...

10.48550/arxiv.2412.01213 preprint EN arXiv (Cornell University) 2024-12-02

Following a conventional design principle that pays more fast-CPU-cycles for fewer slow-I/Os, popular software storage architecture Linux Multiple-Disk (MD) parity-based RAID (e.g., RAID5 and RAID6) assigns one or centralized worker threads to efficiently process all user requests based on multi-stage asynchronous control global data structures, successfully exploiting characteristics of slow devices, e.g., Hard Disk Drives (HDDs). However, we observe that, with high-performance NVMe-based...

10.1145/3627992 article EN ACM Transactions on Storage 2023-10-16

SimRank-based similarity joins, which mainly include threshold-based and top- k are important types of all-pair SimRank queries. Although a line related algorithms have been proposed recently, they still fall short providing approximation guarantee suffer from scalability issues on medium large graphs. Meanwhile, we also lack an extensive analysis existing techniques in terms accuracy efficiency. Motivated by these challenges, first conduct detailed state-of-the-art provide additional...

10.14778/3636218.3636219 article EN Proceedings of the VLDB Endowment 2023-12-01
Coming Soon ...