NFDI4DS | UHH-SEMS - Publication Details

Thomas Neumann

ORCID: 0000-0001-5787-142X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101880157

Research Areas

Advanced Database Systems and Queries
Data Management and Algorithms
Advanced Data Storage Technologies
Distributed systems and fault tolerance
Cloud Computing and Resource Management
Semantic Web and Ontologies
Algorithms and Data Compression
Parallel Computing and Optimization Techniques
Graph Theory and Algorithms
Caching and Content Delivery
Data Quality and Management
Distributed and Parallel Computing Systems
Scientific Computing and Data Management
Web Data Mining and Analysis
Peer-to-Peer Network Technologies
Data Mining Algorithms and Applications
Service-Oriented Architecture and Web Services
Data Stream Mining Techniques
Time Series Analysis and Forecasting
Advanced Image and Video Retrieval Techniques
Bayesian Modeling and Causal Inference
Recommender Systems and Techniques
Computational Geometry and Mesh Generation
Constraint Satisfaction and Optimization
Geographic Information Systems Studies

Technical University of Munich
2015-2024

North Central College
2022

Virginia Commonwealth University
2022

Intel (United States)
2020

Max Planck Institute for Informatics
2006-2018

Klinikum Wilhelmshaven
2018

Tableau Software (United States)
2018

Max Planck Society
2006-2011

Darmstadt University of Applied Sciences
1980-2006

University of Mannheim
2003-2005

The RDF-3X engine for scalable management of RDF data

OPENALEX - Publications

Thomas Neumann Gerhard Weikum

10.1007/s00778-009-0165-y article EN The VLDB Journal 2009-08-31

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

OPENALEX - Publications

Alfons Kemper Thomas Neumann

The two areas of online transaction processing (OLTP) and analytical (OLAP) present different challenges for database architectures. Currently, customers with high rates mission-critical transactions have split their data into separate systems, one OLTP so-called warehouse OLAP. While allowing decent rates, this separation has many disadvantages including freshness issues due to the delay caused by only periodically initiating Extract Transform Load-data staging excessive resource...

10.1109/icde.2011.5767867 article EN 2011-04-01

RDF-3X

OPENALEX - Publications

Thomas Neumann Gerhard Weikum

RDF is a data representation format for schema-free structured information that gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature flexible pattern-matching capabilities its query language SPARQL entail efficiency scalability challenges complex queries including long join paths. This paper presents RDF-3X engine, an implementation achieves excellent performance by pursuing RISC-style architecture with streamlined...

10.14778/1453856.1453927 article EN Proceedings of the VLDB Endowment 2008-08-01

How good are query optimizers, really?

OPENALEX - Publications

Viktor Leis Andrey Gubichev Atanas Mirchev Peter Boncz Alfons Kemper and 1 more

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit main components in classic optimizer architecture using complex, real-world data set realistic multi-join queries. We investigate quality of industrial-strength cardinality estimators find that all routinely produce large errors. further show while estimates are essential finding order, performance unsatisfactory if engine relies too heavily on...

10.14778/2850583.2850594 article EN Proceedings of the VLDB Endowment 2015-11-01

Efficiently compiling efficient query plans for modern hardware

OPENALEX - Publications

Thomas Neumann

As main memory grows, query performance is more and determined by the raw CPU costs of processing itself. The classical iterator style technique very simple exible, but shows poor on modern CPUs due to lack locality frequent instruction mispredictions. Several techniques like batch oriented or vectorized tuple have been proposed in past improve this situation, even these are frequently out-performed hand-written execution plans. In work we present a novel compilation strategy that translates...

10.14778/2002938.2002940 article EN Proceedings of the VLDB Endowment 2011-06-01

The adaptive radix tree: ARTful indexing for main-memory databases

OPENALEX - Publications

Viktor Leis Alfons Kemper Thomas Neumann

Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do optimally utilize on-CPU caches. Hash tables, also often used for indexes, fast but only support queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) indexing in main...

10.1109/icde.2013.6544812 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Morsel-driven parallelism

OPENALEX - Publications

Viktor Leis Peter Boncz Alfons Kemper Thomas Neumann

With modern computer architecture evolving, two problems conspire against the state-of-the-art approaches in parallel query execution: (i) to take advantage of many-cores, all work must be distributed evenly among (soon) hundreds threads order achieve good speedup, yet (ii) dividing is difficult even with accurate data statistics due complexity out-of-order cores. As a result, existing for plan-driven parallelism run into load balancing and context-switching bottlenecks, therefore no longer...

10.1145/2588555.2610507 article EN 2014-06-18

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

OPENALEX - Publications

Thomas Neumann Tobias Mühlbauer Alfons Kemper

Multi-Version Concurrency Control (MVCC) is a widely employed concurrency control mechanism, as it allows for execution modes where readers never block writers. However, most systems implement only snapshot isolation (SI) instead of full serializability. Adding serializability guarantees to existing SI implementations tends be prohibitively expensive.

10.1145/2723372.2749436 article EN 2015-05-27

Centrosome amplification and instability occurs exclusively in aneuploid, but not in diploid colorectal cancer cell lines, and correlates with numerical chromosomal aberrations

OPENALEX - Publications

B. Michael Ghadimi Dan L. Sackett Michael J. Difilippantonio Evelin Schr�ck Thomas Neumann and 3 more

Measurement of the nuclear DNA content allows classification human cancers as either diploid or aneuploid. To gain further insight into mechanisms aneuploidy, we compared cytogenetic profile mismatch-repair-deficient versus mismatch-repair-proficient aneuploid colorectal carcinoma cell lines using comparative genomic hybridization and spectral karyotyping. Aneuploid carcinomas revealed an average 19 chromosomal imbalances per line. Such numerical aberrations were exceedingly scarce in...

10.1002/(sici)1098-2264(200002)27:2<183::aid-gcc10>3.0.co;2-p article EN Genes Chromosomes and Cancer 2000-02-01

Scalable join processing on very large RDF graphs

OPENALEX - Publications

Thomas Neumann Gerhard Weikum

With the proliferation of RDF data format, engines for query processing are faced with very large graphs that contain hundreds millions triples. This paper addresses resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current focuses join processing, as fine-grained schema-relaxed use often entails star- chain-shaped queries many input streams from index scans.

10.1145/1559845.1559911 article EN 2009-06-29

Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins

OPENALEX - Publications

Thomas Neumann Guido Moerkotte

Accurate cardinality estimates are essential for a successful query optimization. This is not only true relational DBMSs but also RDF stores. An database consists of set triples and, hence, can be seen as with single table three attributes. makes rather special in that queries typically contain many self joins. We show well-prepared to perform estimation this context. Further, there hardly any methods databases. To overcome lack appropriate methods, we introduce characteristic sets together...

10.1109/icde.2011.5767868 article EN 2011-04-01

Massively parallel sort-merge joins in main memory multi-core database systems

OPENALEX - Publications

Martina-Cezara Albutiu Alfons Kemper Thomas Neumann

Two emerging hardware trends will dominate the database system technology in near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic control techniques current were devised for disk-based systems where I/O dominated performance. In this work we take a new look at well-known sort-merge join which, so far, has not been focus research scalable data processing as it was deemed inferior to hash joins. We devise suite...

10.14778/2336664.2336678 article EN Proceedings of the VLDB Endowment 2012-06-01

Preventing bad plans by bounding the impact of cardinality estimation errors

OPENALEX - Publications

Guido Moerkotte Thomas Neumann Gabriele Steidl

Query optimizers rely on accurate estimations of the sizes intermediate results. Wrong size can lead to overly expensive execution plans. We first define q-error measure deviations estimates from actual sizes. The enables derivation two important results: (1) provide bounds such that if is smaller than this bound, query optimizer constructs an optimal plan. (2) If bounded by a number q , we show cost produced plan at most factor 4 worse Motivated these findings, next how find best...

10.14778/1687627.1687738 article EN Proceedings of the VLDB Endowment 2009-08-01

x-RDF-3X

OPENALEX - Publications

Thomas Neumann Gerhard Weikum

The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on engines has focused scalable performance querying, largely disregarded updates. In addition to incremental bulk loading, also require online updates with flexible control over multi-user isolation levels consistency. challenge lies meeting these requirements while retaining the capability fast querying. This paper presents a comprehensive solution that...

10.14778/1920841.1920877 article EN Proceedings of the VLDB Endowment 2010-09-01

Data Blocks

OPENALEX - Publications

Harald Lang Tobias Mühlbauer Florian Funke Peter Boncz Thomas Neumann and 1 more

This work aims at reducing the main-memory footprint in high performance hybrid OLTP & OLAP databases, while retaining query and transactional throughput. For this purpose, an innovative compressed columnar storage format for cold data, called Data Blocks is introduced. further incorporate a new light-weight index structure Positional SMA that narrows scan ranges within even if entire block cannot be ruled out. To achieve highest performance, compression schemes of are very light-weight,...

10.1145/2882903.2882925 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

The mixed workload CH-benCHmark

OPENALEX - Publications

Richard L. Cole Florian Funke Leo Giakoumakis Wey Guy Alfons Kemper and 9 more

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to definition new, complex, mixed workload benchmark, called CH-benCHmark. This bridges gap between established single-workload suites TPC-C for OLTP TPC-H OLAP, executes complex workload: transactional based on order entry processing corresponding TPC-H-equivalent OLAP query suite run in parallel same tables single database system....

10.1145/1988842.1988850 article EN 2011-06-13

Query optimization through the looking glass, and what we found running the Join Order Benchmark

OPENALEX - Publications

Viktor Leis Bernhard Radke Andrey Gubichev Atanas Mirchev Peter Boncz and 2 more

10.1007/s00778-017-0480-7 article EN The VLDB Journal 2017-09-18

RadixSpline

OPENALEX - Publications

Andreas Kipf Ryan Marcus Alexander van Renen Mihail Stoian Alfons Kemper and 2 more

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data.

10.1145/3401071.3401659 article EN 2020-06-03

Benchmarking learned indexes

OPENALEX - Publications

Ryan Marcus Andreas Kipf Alexander van Renen Mihail Stoian Sanchit Misra and 3 more

Recent advancements in learned index structures propose replacing existing structures, like B-Trees, with approximate models. In this work, we present a unified benchmark that compares well-tuned implementations of three against several state-of-the-art "traditional" baselines. Using four real-world datasets, demonstrate can indeed outperform non-learned indexes read-only in-memory workloads over dense array. We investigate the impact caching, pipelining, dataset size, and key size. study...

10.14778/3421424.3421425 article EN Proceedings of the VLDB Endowment 2020-09-01

Managing Non-Volatile Memory in Database Systems

OPENALEX - Publications

Alexander van Renen Viktor Leis Alfons Kemper Thomas Neumann Takushi Hashida and 4 more

Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with persistence traditional devices like flash (SSD). While these properties make NVM highly promising, it not yet clear how to best integrate into layer modern database systems. Two system designs have been proposed. The first use exclusively, i.e., store all data index structures on it. However, because has higher latency than DRAM, this design can be less efficient...

10.1145/3183713.3196897 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

The more the merrier

OPENALEX - Publications

Manuel Then Moritz Kaufmann Fernando Chirigati Tuan-Anh Hoang-Vu Kien Pham and 3 more

Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph algorithms are based breadth-first search (BFS) traversal, which is not only time-consuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose Multi-Source BFS (MS-BFS), an algorithm that designed to run concurrent BFSs over the same single CPU core while...

10.14778/2735496.2735507 article EN Proceedings of the VLDB Endowment 2014-12-01

Exploiting hardware transactional memory in main-memory databases

OPENALEX - Publications

Viktor Leis Alfons Kemper Thomas Neumann

So far, transactional memory-although a promising technique-suffered from the absence of an efficient hardware implementation. The upcoming Haswell microarchitecture Intel introduces memory (HTM) in mainstream CPUs. HTM allows for concurrent, atomic operations, which is also highly desirable context databases. On other hand has several limitations that, general, prevent one-to-one mapping database transactions to transactions. In this work we devise building blocks that can be used exploit...

10.1109/icde.2014.6816683 article EN 2014-03-01

How good are modern spatial analytics systems?

OPENALEX - Publications

Varun Pandey Andreas Kipf Thomas Neumann Alfons Kemper

Spatial data is pervasive. Large amount of spatial produced every day from GPS-enabled devices such as cell phones, cars, sensors, and various consumer based applications Uber, location-tagged posts in Facebook, In-stagram, Snapchat, etc. This growth coupled with the fact that queries, analytical or transactional, can be computationally extensive has attracted enormous interest research community to develop systems efficiently process analyze this data. In recent years a lot analytics have...

10.14778/3236187.3236213 article EN Proceedings of the VLDB Endowment 2018-07-01

Coming Soon ...