Thomas Neumann

ORCID: 0000-0001-5787-142X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Advanced Data Storage Technologies
  • Distributed systems and fault tolerance
  • Cloud Computing and Resource Management
  • Semantic Web and Ontologies
  • Algorithms and Data Compression
  • Parallel Computing and Optimization Techniques
  • Graph Theory and Algorithms
  • Caching and Content Delivery
  • Data Quality and Management
  • Distributed and Parallel Computing Systems
  • Scientific Computing and Data Management
  • Web Data Mining and Analysis
  • Peer-to-Peer Network Technologies
  • Data Mining Algorithms and Applications
  • Service-Oriented Architecture and Web Services
  • Data Stream Mining Techniques
  • Time Series Analysis and Forecasting
  • Advanced Image and Video Retrieval Techniques
  • Bayesian Modeling and Causal Inference
  • Recommender Systems and Techniques
  • Computational Geometry and Mesh Generation
  • Constraint Satisfaction and Optimization
  • Geographic Information Systems Studies

Technical University of Munich
2015-2024

North Central College
2022

Virginia Commonwealth University
2022

Intel (United States)
2020

Max Planck Institute for Informatics
2006-2018

Klinikum Wilhelmshaven
2018

Tableau Software (United States)
2018

Max Planck Society
2006-2011

Darmstadt University of Applied Sciences
1980-2006

University of Mannheim
2003-2005

The two areas of online transaction processing (OLTP) and analytical (OLAP) present different challenges for database architectures. Currently, customers with high rates mission-critical transactions have split their data into separate systems, one OLTP so-called warehouse OLAP. While allowing decent rates, this separation has many disadvantages including freshness issues due to the delay caused by only periodically initiating Extract Transform Load-data staging excessive resource...

10.1109/icde.2011.5767867 article EN 2011-04-01

RDF is a data representation format for schema-free structured information that gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature flexible pattern-matching capabilities its query language SPARQL entail efficiency scalability challenges complex queries including long join paths. This paper presents RDF-3X engine, an implementation achieves excellent performance by pursuing RISC-style architecture with streamlined...

10.14778/1453856.1453927 article EN Proceedings of the VLDB Endowment 2008-08-01

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit main components in classic optimizer architecture using complex, real-world data set realistic multi-join queries. We investigate quality of industrial-strength cardinality estimators find that all routinely produce large errors. further show while estimates are essential finding order, performance unsatisfactory if engine relies too heavily on...

10.14778/2850583.2850594 article EN Proceedings of the VLDB Endowment 2015-11-01

As main memory grows, query performance is more and determined by the raw CPU costs of processing itself. The classical iterator style technique very simple exible, but shows poor on modern CPUs due to lack locality frequent instruction mispredictions. Several techniques like batch oriented or vectorized tuple have been proposed in past improve this situation, even these are frequently out-performed hand-written execution plans. In work we present a novel compilation strategy that translates...

10.14778/2002938.2002940 article EN Proceedings of the VLDB Endowment 2011-06-01

Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do optimally utilize on-CPU caches. Hash tables, also often used for indexes, fast but only support queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) indexing in main...

10.1109/icde.2013.6544812 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

With modern computer architecture evolving, two problems conspire against the state-of-the-art approaches in parallel query execution: (i) to take advantage of many-cores, all work must be distributed evenly among (soon) hundreds threads order achieve good speedup, yet (ii) dividing is difficult even with accurate data statistics due complexity out-of-order cores. As a result, existing for plan-driven parallelism run into load balancing and context-switching bottlenecks, therefore no longer...

10.1145/2588555.2610507 article EN 2014-06-18

Multi-Version Concurrency Control (MVCC) is a widely employed concurrency control mechanism, as it allows for execution modes where readers never block writers. However, most systems implement only snapshot isolation (SI) instead of full serializability. Adding serializability guarantees to existing SI implementations tends be prohibitively expensive.

10.1145/2723372.2749436 article EN 2015-05-27

Measurement of the nuclear DNA content allows classification human cancers as either diploid or aneuploid. To gain further insight into mechanisms aneuploidy, we compared cytogenetic profile mismatch-repair-deficient versus mismatch-repair-proficient aneuploid colorectal carcinoma cell lines using comparative genomic hybridization and spectral karyotyping. Aneuploid carcinomas revealed an average 19 chromosomal imbalances per line. Such numerical aberrations were exceedingly scarce in...

10.1002/(sici)1098-2264(200002)27:2<183::aid-gcc10>3.0.co;2-p article EN Genes Chromosomes and Cancer 2000-02-01

With the proliferation of RDF data format, engines for query processing are faced with very large graphs that contain hundreds millions triples. This paper addresses resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current focuses join processing, as fine-grained schema-relaxed use often entails star- chain-shaped queries many input streams from index scans.

10.1145/1559845.1559911 article EN 2009-06-29

Accurate cardinality estimates are essential for a successful query optimization. This is not only true relational DBMSs but also RDF stores. An database consists of set triples and, hence, can be seen as with single table three attributes. makes rather special in that queries typically contain many self joins. We show well-prepared to perform estimation this context. Further, there hardly any methods databases. To overcome lack appropriate methods, we introduce characteristic sets together...

10.1109/icde.2011.5767868 article EN 2011-04-01

Two emerging hardware trends will dominate the database system technology in near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic control techniques current were devised for disk-based systems where I/O dominated performance. In this work we take a new look at well-known sort-merge join which, so far, has not been focus research scalable data processing as it was deemed inferior to hash joins. We devise suite...

10.14778/2336664.2336678 article EN Proceedings of the VLDB Endowment 2012-06-01

Query optimizers rely on accurate estimations of the sizes intermediate results. Wrong size can lead to overly expensive execution plans. We first define q-error measure deviations estimates from actual sizes. The enables derivation two important results: (1) provide bounds such that if is smaller than this bound, query optimizer constructs an optimal plan. (2) If bounded by a number q , we show cost produced plan at most factor 4 worse Motivated these findings, next how find best...

10.14778/1687627.1687738 article EN Proceedings of the VLDB Endowment 2009-08-01

The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on engines has focused scalable performance querying, largely disregarded updates. In addition to incremental bulk loading, also require online updates with flexible control over multi-user isolation levels consistency. challenge lies meeting these requirements while retaining the capability fast querying. This paper presents a comprehensive solution that...

10.14778/1920841.1920877 article EN Proceedings of the VLDB Endowment 2010-09-01

This work aims at reducing the main-memory footprint in high performance hybrid OLTP & OLAP databases, while retaining query and transactional throughput. For this purpose, an innovative compressed columnar storage format for cold data, called Data Blocks is introduced. further incorporate a new light-weight index structure Positional SMA that narrows scan ranges within even if entire block cannot be ruled out. To achieve highest performance, compression schemes of are very light-weight,...

10.1145/2882903.2882925 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to definition new, complex, mixed workload benchmark, called CH-benCHmark. This bridges gap between established single-workload suites TPC-C for OLTP TPC-H OLAP, executes complex workload: transactional based on order entry processing corresponding TPC-H-equivalent OLAP query suite run in parallel same tables single database system....

10.1145/1988842.1988850 article EN 2011-06-13

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing are often cumbersome to implement slow build. In fact, most approaches we aware of require multiple training passes over the data.

10.1145/3401071.3401659 article EN 2020-06-03

Recent advancements in learned index structures propose replacing existing structures, like B-Trees, with approximate models. In this work, we present a unified benchmark that compares well-tuned implementations of three against several state-of-the-art "traditional" baselines. Using four real-world datasets, demonstrate can indeed outperform non-learned indexes read-only in-memory workloads over dense array. We investigate the impact caching, pipelining, dataset size, and key size. study...

10.14778/3421424.3421425 article EN Proceedings of the VLDB Endowment 2020-09-01

Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with persistence traditional devices like flash (SSD). While these properties make NVM highly promising, it not yet clear how to best integrate into layer modern database systems. Two system designs have been proposed. The first use exclusively, i.e., store all data index structures on it. However, because has higher latency than DRAM, this design can be less efficient...

10.1145/3183713.3196897 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph algorithms are based breadth-first search (BFS) traversal, which is not only time-consuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose Multi-Source BFS (MS-BFS), an algorithm that designed to run concurrent BFSs over the same single CPU core while...

10.14778/2735496.2735507 article EN Proceedings of the VLDB Endowment 2014-12-01

So far, transactional memory-although a promising technique-suffered from the absence of an efficient hardware implementation. The upcoming Haswell microarchitecture Intel introduces memory (HTM) in mainstream CPUs. HTM allows for concurrent, atomic operations, which is also highly desirable context databases. On other hand has several limitations that, general, prevent one-to-one mapping database transactions to transactions. In this work we devise building blocks that can be used exploit...

10.1109/icde.2014.6816683 article EN 2014-03-01

Spatial data is pervasive. Large amount of spatial produced every day from GPS-enabled devices such as cell phones, cars, sensors, and various consumer based applications Uber, location-tagged posts in Facebook, In-stagram, Snapchat, etc. This growth coupled with the fact that queries, analytical or transactional, can be computationally extensive has attracted enormous interest research community to develop systems efficiently process analyze this data. In recent years a lot analytics have...

10.14778/3236187.3236213 article EN Proceedings of the VLDB Endowment 2018-07-01
Coming Soon ...