NFDI4DS | UHH-SEMS - Publication Details

RadixSpline

OPENALEX - Publications

Andreas Kipf Ryan Marcus Alexander van Renen Mihail Stoian Alfons Kemper and 2 more

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data.

10.1145/3401071.3401659 article EN 2020-06-03

Benchmarking learned indexes

OPENALEX - Publications

Ryan Marcus Andreas Kipf Alexander van Renen Mihail Stoian Sanchit Misra and 3 more

Recent advancements in learned index structures propose replacing existing structures, like B-Trees, with approximate models. In this work, we present a unified benchmark that compares well-tuned implementations of three against several state-of-the-art "traditional" baselines. Using four real-world datasets, demonstrate can indeed outperform non-learned indexes read-only in-memory workloads over dense array. We investigate the impact caching, pipelining, dataset size, and key size. study...

10.14778/3421424.3421425 article EN Proceedings of the VLDB Endowment 2020-09-01

Managing Non-Volatile Memory in Database Systems

OPENALEX - Publications

Alexander van Renen Viktor Leis Alfons Kemper Thomas Neumann Takushi Hashida and 4 more

Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with persistence traditional devices like flash (SSD). While these properties make NVM highly promising, it not yet clear how to best integrate into layer modern database systems. Two system designs have been proposed. The first use exclusively, i.e., store all data index structures on it. However, because has higher latency than DRAM, this design can be less efficient...

10.1145/3183713.3196897 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Persistent Memory I/O Primitives

OPENALEX - Publications

Alexander van Renen Lukas Vogel Viktor Leis Thomas Neumann Alfons Kemper

I/O latency and throughput is one of the major performance bottlenecks for disk-based database systems. Upcoming persistent memory (PMem) technologies, like Intel's Optane DC Persistent Memory Modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first evaluations PMem in terms bandwidth latency. Based on results, develop guidelines efficient usage two essential primitives tuned PMem: log writing block flushing.

10.1145/3329785.3329930 article EN 2019-06-24

SOSD: A Benchmark for Learned Indexes

OPENALEX - Publications

Andreas Kipf Ryan Marcus Alexander van Renen Mihail Stoian Alfons Kemper and 2 more

A groundswell of recent work has focused on improving data management systems with learned components. Specifically, index structures proposed replacing traditional structures, such as B-trees, models. Given the decades research committed to there is significant skepticism about whether indexes actually outperform state-of-the-art implementations real-world data. To answer this question, we propose a new benchmarking framework that comes variety datasets and baseline compare against. We also...

10.48550/arxiv.1911.13014 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory

OPENALEX - Publications

Philipp Fent Alexander van Renen Andreas Kipf Viktor Leis Thomas Neumann and 1 more

While hardware and software improvements greatly accelerated modern database systems' internal operations, the decades-old stream-based Socket API for external communication is still unchanged. We show experimentally, that high-performance systems networking has become a performance bottleneck. Therefore, we argue stack needs to be redesigned fully exploit - as already happened most other system components.We propose L5, layer systems. L5 rethinks flow of data in out based on direct memory...

10.1109/icde48307.2020.00131 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

RadixSpline: A Single-Pass Learned Index

OPENALEX - Publications

Andreas Kipf Ryan Marcus Alexander van Renen Mihail Stoian Alfons Kemper and 2 more

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data. We introduce RadixSpline (RS), be built single pass data competitive with models, like RMI, evaluate RS using SOSD benchmark show it achieves results on all datasets, despite fact only two...

10.48550/arxiv.2004.14541 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Building blocks for persistent memory

OPENALEX - Publications

Alexander van Renen Lukas Vogel Viktor Leis Thomas Neumann Alfons Kemper

Abstract I/O latency and throughput are two of the major performance bottlenecks for disk-based database systems. Persistent memory (PMem) technologies, like Intel’s Optane DC persistent modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first comprehensive evaluation PMem on real hardware in terms bandwidth latency. Based results, develop guidelines efficient usage four optimized low-level building blocks applications:...

10.1007/s00778-020-00622-9 article EN cc-by The VLDB Journal 2020-09-23

Plush

OPENALEX - Publications

Lukas Vogel Alexander van Renen Satoshi Imamura Jana Giceva Thomas Neumann and 1 more

Persistent memory (PMem) promised DRAM-like performance, byte addressability, and the persistency guarantees of conventional block storage. With release Intel Optane DCPMM, those expectations were dampened. While its write latency competes with DRAM, read latency, endurance, especially bandwidth fall behind by up to an order magnitude. Established PMem index structures mostly focus on lookups cannot leverage PMem's low latency. For inserts, DRAM-optimized are still magnitude faster than...

10.14778/3551793.3551839 article EN Proceedings of the VLDB Endowment 2022-07-01

Why TPC is Not Enough: An Analysis of the Amazon Redshift Fleet

OPENALEX - Publications

Alexander van Renen Dominik Horn Pascal Pfeil Kapil Vaidya Wenjian Dong and 5 more

Database research and development is heavily influenced by benchmarks, such as the industry-standard TPC-H TPC-DS for analytical systems. However, these twenty-year-old benchmarks neither capture how databases are deployed nor what workloads modern cloud data warehouse systems face days. In this paper, we summarize well-known, confirm suspected, unearth novel discrepancies between TPC-H/DS actual using empirical data. We base our analysis on telemetrics from Amazon Redshift - one of largest...

10.14778/3681954.3682031 article EN Proceedings of the VLDB Endowment 2024-07-01

Cloud Analytics Benchmark

OPENALEX - Publications

Alexander van Renen Viktor Leis

The cloud facilitates the transition to a service-oriented perspective. This affects cloud-native data management in general, and analytics particular. Instead of managing multi-node database cluster on-premise, end users simply send queries managed warehouse receive results. While this is obviously very attractive for users, system architects still have engineer systems new service model. There are currently many competing architectures ranging from self-hosted (Presto, PostgreSQL), over...

10.14778/3583140.3583156 article EN Proceedings of the VLDB Endowment 2023-02-01

The Case for Learned Spatial Indexes

OPENALEX - Publications

Varun Pandey Alexander van Renen Andreas Kipf Ibrahim Sabek Jialin Ding and 1 more

Spatial data is ubiquitous. Massive amounts of are generated every day from billions GPS-enabled devices such as cell phones, cars, sensors, and various consumer-based applications Uber, Tinder, location-tagged posts in Facebook, Twitter, Instagram, etc. This exponential growth spatial has led the research community to focus on building systems that can process efficiently. In meantime, recent introduced learned index structures. this work, we use techniques proposed a state-of-the art...

10.48550/arxiv.2008.10349 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Mosaic

OPENALEX - Publications

Lukas Vogel Viktor Leis Alexander van Renen Thomas Neumann Satoshi Imamura and 1 more

Relational database systems are purpose-built for a specific storage device class (e.g., HDD, SSD, or DRAM). They do not cope well with the multitude of devices that competitive at their price `sweet spots'. To make use different classes, users have to resort workarounds, such as storing data in tablespaces. A lot research has been done on heterogeneous frameworks distributed big query engines. These engines scale sets but often CPU- network-bound. Both approaches only maximize performance...

10.14778/3407790.3407852 article EN Proceedings of the VLDB Endowment 2020-08-01

How Good Are Modern Spatial Libraries?

OPENALEX - Publications

Varun Pandey Alexander van Renen Andreas Kipf Alfons Kemper

Abstract Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These and services either build their own management systems existing solutions. JTS Topology Suite (JTS), C++ port GEOS, Google S2, ESRI Geometry API, Java Spatial Index (JSI) are some of the processing libraries that these upon. depend indexing capabilities available in for high-performance query processing. In this work, we compare qualitatively quantitatively based four...

10.1007/s41019-020-00147-9 article EN cc-by Data Science and Engineering 2020-11-07

FastVer: Making Data Integrity a Commodity

OPENALEX - Publications

Arvind Arasu Badrish Chandramouli Johannes Gehrke Esha Ghosh Donald Kossmann and 8 more

We present FastVer, a high-performance key-value store with strong data integrity guarantees. FastVer is built as an extension of FASTER, open-source, store. It offers the same API FASTER plus additional verify() method that detects if unauthorized attacker tampered database and checks whether results all read operations are consistent historical updates. based on novel approach combines advantages Merkle trees deferred memory verification. show this achieves one to two orders magnitudes...

10.1145/3448016.3457312 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Data Management on Non-Volatile Memory: A Perspective

OPENALEX - Publications

Philipp Götze Alexander van Renen Lucas Lersch Viktor Leis Ismail Oukid

10.1007/s13222-018-0301-1 article EN Datenbank-Spektrum 2018-10-05

SageDB

OPENALEX - Publications

Jialin Ding Ryan Marcus Andreas Kipf Vikram Nathan Aniruddha Nrusimha and 3 more

Modern data systems are typically both complex and general-purpose. They because of the numerous internal knobs parameters that users need to manually tune in order achieve good performance; they general-purpose designed handle diverse use cases, therefore often do not best possible performance for any specific case. A recent trend aims tackle these pitfalls: instance-optimized automatically self-adjust a case, i.e., dataset query workload. Thus far, research community has focused on...

10.14778/3565838.3565857 article EN Proceedings of the VLDB Endowment 2022-09-01

Corra: Correlation-Aware Column Compression

OPENALEX - Publications

Hanwen Liu Mihail Stoian Alexander van Renen Andreas Kipf

Column encoding schemes have witnessed a spark of interest lately. This is not surprising -- as data volume increases, being able to keep one's dataset in main memory for fast processing coveted desideratum. However, it also seems that single-column reached plateau terms the compression size they can achieve. We argue this because do exploit correlations data. Consider instance column pair ($\texttt{city}$, $\texttt{zip-code}$) DMV dataset: city has only few dozen unique zip codes. Such...

10.48550/arxiv.2403.17229 preprint EN arXiv (Cornell University) 2024-03-25

DataLoom: Simplifying Data Loading with LLMs

OPENALEX - Publications

Alexander van Renen Mihail Stoian Andreas Kipf

Schema discovery and data loading is a crucial step in any analysis pipeline. While this used to be rare task, the highly dynamic field of machine learning modern business intelligence on top lakes, today it has become frequent, but often underestimated, activity. Existing tools focus single files, presume prior knowledge user's side or significant amount manual labor. In paper, we improve process mapping "chaotic" set files an initial database schema that can then iteratively refined...

10.14778/3685800.3685897 article EN Proceedings of the VLDB Endowment 2024-08-01

Lightweight Correlation-Aware Table Compression

OPENALEX - Publications

Mihail Stoian Alexander van Renen Jan Kobiolka Paula Kuo Josif Grabocka and 1 more

The growing adoption of data lakes for managing relational necessitates efficient, open storage formats that provide high scan performance and competitive compression ratios. While existing achieve fast scans through lightweight encoding techniques, they have reached a plateau in terms minimizing footprint. Recently, correlation-aware schemes been shown to reduce file sizes further. Yet, current approaches either incur significant overheads or require manual specification correlations,...

10.48550/arxiv.2410.14066 preprint EN arXiv (Cornell University) 2024-10-17

Persistent Memory I/O Primitives

OPENALEX - Publications

Alexander van Renen Lukas Vogel Viktor Leis Thomas Neumann Alfons Kemper

I/O latency and throughput is one of the major performance bottlenecks for disk-based database systems. Upcoming persistent memory (PMem) technologies, like Intel's Optane DC Persistent Memory Modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first evaluations PMem in terms bandwidth latency. Based on results, develop guidelines efficient usage two essential primitives tuned PMem: log writing block flushing.

10.48550/arxiv.1904.01614 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Enhancing In-Memory Spatial Indexing with Learned Search

OPENALEX - Publications

Varun Pandey Alexander van Renen Eleni Tzirita Zacharatou Andreas Kipf Ibrahim Sabek and 3 more

Spatial data is ubiquitous. Massive amounts of are generated every day from a plethora sources such as billions GPS-enabled devices (e.g., cell phones, cars, and sensors), consumer-based applications Uber Strava), social media platforms location-tagged posts on Facebook, Twitter, Instagram). This exponential growth in spatial has led the research community to build systems for efficient processing. In this study, we apply recently developed machine-learned search technique single-dimensional...

10.48550/arxiv.2309.06354 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01