- Advanced Data Storage Technologies
- Advanced Database Systems and Queries
- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Data Management and Algorithms
- Caching and Content Delivery
- Algorithms and Data Compression
- Data Quality and Management
- Distributed and Parallel Computing Systems
- Graph Theory and Algorithms
- Network Packet Processing and Optimization
- Semantic Web and Ontologies
- Interconnection Networks and Systems
- Web Data Mining and Analysis
- Internet Traffic Analysis and Secure E-voting
- IoT and Edge/Fog Computing
- Bayesian Modeling and Causal Inference
- Scientific Computing and Data Management
- Constraint Satisfaction and Optimization
- Data Mining Algorithms and Applications
- Advanced Image and Video Retrieval Techniques
- Video Coding and Compression Technologies
- Time Series Analysis and Forecasting
- Software System Performance and Reliability
Technical University of Munich
2014-2025
Friedrich-Alexander-Universität Erlangen-Nürnberg
2021-2022
Friedrich Schiller University Jena
2019-2021
Tableau Software (United States)
2018
Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit main components in classic optimizer architecture using complex, real-world data set realistic multi-join queries. We investigate quality of industrial-strength cardinality estimators find that all routinely produce large errors. further show while estimates are essential finding order, performance unsatisfactory if engine relies too heavily on...
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do optimally utilize on-CPU caches. Hash tables, also often used for indexes, fast but only support queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) indexing in main...
With modern computer architecture evolving, two problems conspire against the state-of-the-art approaches in parallel query execution: (i) to take advantage of many-cores, all work must be distributed evenly among (soon) hundreds threads order achieve good speedup, yet (ii) dividing is difficult even with accurate data statistics due complexity out-of-order cores. As a result, existing for plan-driven parallelism run into load balancing and context-switching bottlenecks, therefore no longer...
We describe a new deep learning approach to cardinality estimation. MSCN is multi-set convolutional network, tailored representing relational query plans, that employs set semantics capture features and true cardinalities. builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify predicate, in capturing join-crossing correlations. Our evaluation of using real-world dataset shows significantly enhances the quality which core problem optimization.
We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups common range queries: open-range queries, closed-range counts. is based on new called Fast Trie (FST) that matches point query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in queries are tunable to satisfy different application...
Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with persistence traditional devices like flash (SSD). While these properties make NVM highly promising, it not yet clear how to best integrate into layer modern database systems. Two system designs have been proposed. The first use exclusively, i.e., store all data index structures on it. However, because has higher latency than DRAM, this design can be less efficient...
So far, transactional memory-although a promising technique-suffered from the absence of an efficient hardware implementation. The upcoming Haswell microarchitecture Intel introduces memory (HTM) in mainstream CPUs. HTM allows for concurrent, atomic operations, which is also highly desirable context databases. On other hand has several limitations that, general, prevent one-to-one mapping database transactions to transactions. In this work we devise building blocks that can be used exploit...
The performance of transactional database systems is critically dependent on the efficient synchronization in-memory data structures. traditional approach, fine-grained locking, does not scale modern hardware. Lock-free structures, in contrast, very well but are extremely difficult to implement and often require additional indirections. In this work, we argue for a middle ground, i.e., protocols that use only sparingly. We synchronize Adaptive Radix Tree (ART) using two such protocols,...
We present the Height Optimized Trie (HOT), a fast and space-efficient in-memory index structure. The core algorithmic idea of HOT is to dynamically vary number bits considered at each node, which enables consistently high fanout thereby good cache efficiency. layout node carefully engineered for compactness search using SIMD instructions. Our experimental results, use wide variety workloads data sets, show that outperforms other state-of-the-art structures string keys both in terms...
Disk-based database systems use buffer managers in order to transparently manage data sets larger than main memory. This traditional approach is effective at minimizing the number of I/O operations, but also major source overhead comparison with in-memory systems. To avoid this overhead, therefore abandon management altogether, which makes handling memory very difficult. In work, we revisit fundamental dichotomy and design a novel storage manager that optimized for modern hardware. Our...
In 2013, Microsoft Research proposed the Bw-Tree (humorously termed "Buzz Word Tree''), a lock-free index that provides high throughput for transactional database workloads in SQL Server's Hekaton engine. The Buzz Tree avoids locks by appending delta record to tree nodes and using an indirection layer allows it atomically update physical pointers compare-and-swap (CaS). Correctly implementing this techniques requires careful attention detail. Unfortunately, papers from are missing important...
I/O latency and throughput is one of the major performance bottlenecks for disk-based database systems. Upcoming persistent memory (PMem) technologies, like Intel's Optane DC Persistent Memory Modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first evaluations PMem in terms bandwidth latency. Based on results, develop guidelines efficient usage two essential primitives tuned PMem: log writing block flushing.
NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables 10 million I/O operations per second or more. Our experiments show that existing out-of-memory database systems storage engines achieve only fraction this performance. In work, we demonstrate it is possible to close the performance gap between hardware software through an optimized engine design. heavy setting, where dataset times larger than main memory, our system...
Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation the time it takes generate code. Even fast frameworks like LLVM, generating complex hundreds of milliseconds. Such durations can be major disadvantage workloads that execute many complex, but quick To solve this problem, we propose an adaptive execution framework, which dynamically switches from interpretation compilation. We also bytecode interpreter without costly...
Industrial as well academic analytics systems are usually evaluated based on well-known standard benchmarks, such TPC-H or TPC-DS. These benchmarks test various components of the DBMS including join optimizer, implementation and aggregation operators, concurrency control scheduler. However, these fall short evaluating "real" challenges imposed by modern BI systems, Tableau, that emit machine-generated query workloads. This paper reports a comprehensive study set more than 60k real-world data...
The query engines of most modern database systems are either based on vectorization or data-centric code generation. These two state-of-the-art processing paradigms fundamentally different in terms system structure and execution code. Both were used to build fast systems. However, until today it is not clear which paradigm yields faster execution, as many implementation-specific choices obstruct a direct comparison architectures. In this paper, we experimentally compare the models by...
In this paper, we propose ScaleStore, a novel distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, scalability at the same time. Using low latency messages, ScaleStore implements transparent memory abstraction provides access aggregated of all nodes. contrast existing designs such as NAM-DB or FaRM, stores cold data on SSDs (flash), lowering overall hardware cost significantly. The core is caching strategy...
Most database management systems cache pages from storage in a main memory buffer pool. To do this, they either rely on hash table that translates page identifiers into pointers, or pointer swizzling which avoids this translation. In work, we propose vmcache, manager design instead uses hardware-supported virtual to translate addresses. contrast existing mmap-based approaches, the DBMS retains control over faulting and eviction. Our is portable across modern operating systems, supports...
Remote data structures built with one-sided Direct Memory Access (RDMA) are at the heart of many disaggregated database management systems today. Concurrent access to these by thousands remote workers necessitates a highly efficient synchronization scheme. Remarkably, our investigation reveals that existing schemes display substantial variations in performance and scalability. Even worse, some do not correctly synchronize, resulting rare hard-to-detect corruption. Motivated observations, we...
This paper revisits the longstanding challenge of coordinating database systems with general-purpose OS interfaces, such as POSIX, which often lack tailored support for DB requirements. Existing approaches to this DB-OS co-design struggle limited design space, security risks, and compatibility issues. To overcome these hurdles, we propose a new approach leveraging virtualization elevate privilege level processes. Our method enables fully exploit hardware capabilities via virtualization,...
Large main memory capacity and even larger data sets have motivated hybrid storage systems, which serve most transactions from memory, but can seamlessly transition to flash storage. In such the structure of choice is usually a B-Tree with pageable nodes. Most academic work considers only fixed size records, making them unsuitable for practical applications. Given prevalence B-Trees, surprisingly few available implementations benchmarks optimized B-Trees cover variable-sized records. this...
Remote data structures built with one-sided Direct Memory Access (RDMA) are at the heart of many disaggregated database management systems today. Concurrent access to these by thousands remote workers necessitates a highly efficient synchronization scheme. Remarkably, our investigation reveals that existing schemes display substantial variations in performance and scalability. Even worse, some do not correctly synchronize, resulting rare hard-to-detect corruption. Motivated observations, we...