Alexander van Renen

ORCID: 0000-0002-6365-4592
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Advanced Database Systems and Queries
  • Cloud Computing and Resource Management
  • Data Management and Algorithms
  • Geographic Information Systems Studies
  • Distributed systems and fault tolerance
  • Algorithms and Data Compression
  • Parallel Computing and Optimization Techniques
  • Caching and Content Delivery
  • Data Stream Mining Techniques
  • Distributed and Parallel Computing Systems
  • Time Series Analysis and Forecasting
  • Data Quality and Management
  • Data Mining Algorithms and Applications
  • Research Data Management Practices
  • Cloud Data Security Solutions
  • Security and Verification in Computing
  • 3D Modeling in Geospatial Applications
  • Scientific Computing and Data Management
  • Advanced Text Analysis Techniques
  • IoT and Edge/Fog Computing
  • Advanced Data Compression Techniques
  • Semantic Web and Ontologies

Friedrich-Alexander-Universität Erlangen-Nürnberg
2022-2023

Technical University of Munich
2018-2021

Intel (United States)
2020

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data.

10.1145/3401071.3401659 article EN 2020-06-03

Recent advancements in learned index structures propose replacing existing structures, like B-Trees, with approximate models. In this work, we present a unified benchmark that compares well-tuned implementations of three against several state-of-the-art "traditional" baselines. Using four real-world datasets, demonstrate can indeed outperform non-learned indexes read-only in-memory workloads over dense array. We investigate the impact caching, pipelining, dataset size, and key size. study...

10.14778/3421424.3421425 article EN Proceedings of the VLDB Endowment 2020-09-01

Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with persistence traditional devices like flash (SSD). While these properties make NVM highly promising, it not yet clear how to best integrate into layer modern database systems. Two system designs have been proposed. The first use exclusively, i.e., store all data index structures on it. However, because has higher latency than DRAM, this design can be less efficient...

10.1145/3183713.3196897 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

I/O latency and throughput is one of the major performance bottlenecks for disk-based database systems. Upcoming persistent memory (PMem) technologies, like Intel's Optane DC Persistent Memory Modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first evaluations PMem in terms bandwidth latency. Based on results, develop guidelines efficient usage two essential primitives tuned PMem: log writing block flushing.

10.1145/3329785.3329930 article EN 2019-06-24

A groundswell of recent work has focused on improving data management systems with learned components. Specifically, index structures proposed replacing traditional structures, such as B-trees, models. Given the decades research committed to there is significant skepticism about whether indexes actually outperform state-of-the-art implementations real-world data. To answer this question, we propose a new benchmarking framework that comes variety datasets and baseline compare against. We also...

10.48550/arxiv.1911.13014 preprint EN other-oa arXiv (Cornell University) 2019-01-01

While hardware and software improvements greatly accelerated modern database systems' internal operations, the decades-old stream-based Socket API for external communication is still unchanged. We show experimentally, that high-performance systems networking has become a performance bottleneck. Therefore, we argue stack needs to be redesigned fully exploit - as already happened most other system components.We propose L5, layer systems. L5 rethinks flow of data in out based on direct memory...

10.1109/icde48307.2020.00131 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data. We introduce RadixSpline (RS), be built single pass data competitive with models, like RMI, evaluate RS using SOSD benchmark show it achieves results on all datasets, despite fact only two...

10.48550/arxiv.2004.14541 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Abstract I/O latency and throughput are two of the major performance bottlenecks for disk-based database systems. Persistent memory (PMem) technologies, like Intel’s Optane DC persistent modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first comprehensive evaluation PMem on real hardware in terms bandwidth latency. Based results, develop guidelines efficient usage four optimized low-level building blocks applications:...

10.1007/s00778-020-00622-9 article EN cc-by The VLDB Journal 2020-09-23

Persistent memory (PMem) promised DRAM-like performance, byte addressability, and the persistency guarantees of conventional block storage. With release Intel Optane DCPMM, those expectations were dampened. While its write latency competes with DRAM, read latency, endurance, especially bandwidth fall behind by up to an order magnitude. Established PMem index structures mostly focus on lookups cannot leverage PMem's low latency. For inserts, DRAM-optimized are still magnitude faster than...

10.14778/3551793.3551839 article EN Proceedings of the VLDB Endowment 2022-07-01

Database research and development is heavily influenced by benchmarks, such as the industry-standard TPC-H TPC-DS for analytical systems. However, these twenty-year-old benchmarks neither capture how databases are deployed nor what workloads modern cloud data warehouse systems face days. In this paper, we summarize well-known, confirm suspected, unearth novel discrepancies between TPC-H/DS actual using empirical data. We base our analysis on telemetrics from Amazon Redshift - one of largest...

10.14778/3681954.3682031 article EN Proceedings of the VLDB Endowment 2024-07-01

The cloud facilitates the transition to a service-oriented perspective. This affects cloud-native data management in general, and analytics particular. Instead of managing multi-node database cluster on-premise, end users simply send queries managed warehouse receive results. While this is obviously very attractive for users, system architects still have engineer systems new service model. There are currently many competing architectures ranging from self-hosted (Presto, PostgreSQL), over...

10.14778/3583140.3583156 article EN Proceedings of the VLDB Endowment 2023-02-01

Spatial data is ubiquitous. Massive amounts of are generated every day from billions GPS-enabled devices such as cell phones, cars, sensors, and various consumer-based applications Uber, Tinder, location-tagged posts in Facebook, Twitter, Instagram, etc. This exponential growth spatial has led the research community to focus on building systems that can process efficiently. In meantime, recent introduced learned index structures. this work, we use techniques proposed a state-of-the art...

10.48550/arxiv.2008.10349 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Relational database systems are purpose-built for a specific storage device class (e.g., HDD, SSD, or DRAM). They do not cope well with the multitude of devices that competitive at their price `sweet spots'. To make use different classes, users have to resort workarounds, such as storing data in tablespaces. A lot research has been done on heterogeneous frameworks distributed big query engines. These engines scale sets but often CPU- network-bound. Both approaches only maximize performance...

10.14778/3407790.3407852 article EN Proceedings of the VLDB Endowment 2020-08-01

Abstract Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These and services either build their own management systems existing solutions. JTS Topology Suite (JTS), C++ port GEOS, Google S2, ESRI Geometry API, Java Spatial Index (JSI) are some of the processing libraries that these upon. depend indexing capabilities available in for high-performance query processing. In this work, we compare qualitatively quantitatively based four...

10.1007/s41019-020-00147-9 article EN cc-by Data Science and Engineering 2020-11-07

We present FastVer, a high-performance key-value store with strong data integrity guarantees. FastVer is built as an extension of FASTER, open-source, store. It offers the same API FASTER plus additional verify() method that detects if unauthorized attacker tampered database and checks whether results all read operations are consistent historical updates. based on novel approach combines advantages Merkle trees deferred memory verification. show this achieves one to two orders magnitudes...

10.1145/3448016.3457312 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Modern data systems are typically both complex and general-purpose. They because of the numerous internal knobs parameters that users need to manually tune in order achieve good performance; they general-purpose designed handle diverse use cases, therefore often do not best possible performance for any specific case. A recent trend aims tackle these pitfalls: instance-optimized automatically self-adjust a case, i.e., dataset query workload. Thus far, research community has focused on...

10.14778/3565838.3565857 article EN Proceedings of the VLDB Endowment 2022-09-01

Column encoding schemes have witnessed a spark of interest lately. This is not surprising -- as data volume increases, being able to keep one's dataset in main memory for fast processing coveted desideratum. However, it also seems that single-column reached plateau terms the compression size they can achieve. We argue this because do exploit correlations data. Consider instance column pair ($\texttt{city}$, $\texttt{zip-code}$) DMV dataset: city has only few dozen unique zip codes. Such...

10.48550/arxiv.2403.17229 preprint EN arXiv (Cornell University) 2024-03-25

Schema discovery and data loading is a crucial step in any analysis pipeline. While this used to be rare task, the highly dynamic field of machine learning modern business intelligence on top lakes, today it has become frequent, but often underestimated, activity. Existing tools focus single files, presume prior knowledge user's side or significant amount manual labor. In paper, we improve process mapping "chaotic" set files an initial database schema that can then iteratively refined...

10.14778/3685800.3685897 article EN Proceedings of the VLDB Endowment 2024-08-01

The growing adoption of data lakes for managing relational necessitates efficient, open storage formats that provide high scan performance and competitive compression ratios. While existing achieve fast scans through lightweight encoding techniques, they have reached a plateau in terms minimizing footprint. Recently, correlation-aware schemes been shown to reduce file sizes further. Yet, current approaches either incur significant overheads or require manual specification correlations,...

10.48550/arxiv.2410.14066 preprint EN arXiv (Cornell University) 2024-10-17

I/O latency and throughput is one of the major performance bottlenecks for disk-based database systems. Upcoming persistent memory (PMem) technologies, like Intel's Optane DC Persistent Memory Modules, promise to bridge gap between NAND-based flash (SSD) DRAM, thus eliminate bottleneck. In this paper, we provide first evaluations PMem in terms bandwidth latency. Based on results, develop guidelines efficient usage two essential primitives tuned PMem: log writing block flushing.

10.48550/arxiv.1904.01614 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Spatial data is ubiquitous. Massive amounts of are generated every day from a plethora sources such as billions GPS-enabled devices (e.g., cell phones, cars, and sensors), consumer-based applications Uber Strava), social media platforms location-tagged posts on Facebook, Twitter, Instagram). This exponential growth in spatial has led the research community to build systems for efficient processing. In this study, we apply recently developed machine-learned search technique single-dimensional...

10.48550/arxiv.2309.06354 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01
Coming Soon ...