NFDI4DS | UHH-SEMS - Publication Details

Jens Dittrich

ORCID: 0000-0003-1015-804X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5091334588

Research Areas

Advanced Database Systems and Queries
Data Management and Algorithms
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Distributed systems and fault tolerance
Data Mining Algorithms and Applications
Algorithms and Data Compression
Scientific Computing and Data Management
Graph Theory and Algorithms
Blockchain Technology Applications and Security
Caching and Content Delivery
Parallel Computing and Optimization Techniques
Meteorological Phenomena and Simulations
Data Stream Mining Techniques
Big Data and Business Intelligence
Web Data Mining and Analysis
Fire effects on ecosystems
Semantic Web and Ontologies
Peer-to-Peer Network Technologies
Software Testing and Debugging Techniques
Data Quality and Management
Plant Water Relations and Carbon Dynamics
Advanced Text Analysis Techniques
Information Systems Education and Curriculum Development
Optimization and Search Problems

Saarland University
2013-2023

Max Planck Institute for Informatics
2009

Max Planck Society
2009

ETH Zurich
2008

Philipps University of Marburg
2002

Runtime measurements in the cloud

OPENALEX - Publications

Jörg Schad Jens Dittrich Jorge-Arnulfo Quiané-Ruiz

One of the main reasons why cloud computing has gained so much popularity is due to its ease use and ability scale resources on demand. As a result, users can now rent nodes large commercial clusters through several vendors, such as Amazon rackspace. However, despite attention paid by Cloud providers, performance unpredictability major issue in for (1) database researchers performing wall clock experiments, (2) applications providing service-level agreements. In this paper, we carry out...

10.14778/1920841.1920902 article EN Proceedings of the VLDB Endowment 2010-09-01

Hadoop++

OPENALEX - Publications

Jens Dittrich Jorge-Arnulfo Quiané-Ruiz Alekh Jindal Yağíz Kargín Vinay Setty and 1 more

MapReduce is a computing paradigm that has gained lot of attention in recent years from industry and research. Unlike parallel DBMSs, allows non-expert users to run complex analytical tasks over very large data sets on clusters clouds. However, this comes at price: processes scan-oriented fashion. Hence, the performance Hadoop --- an open-source implementation often does not match one well-configured DBMS. In paper we propose new type system named Hadoop++: it boosts task without changing...

10.14778/1920841.1920908 article EN Proceedings of the VLDB Endowment 2010-09-01

Efficient big data processing in Hadoop MapReduce

OPENALEX - Publications

Jens Dittrich Jorge-Arnulfo Quiané-Ruiz

This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific social networks. A popular processing engine for Hadoop MapReduce. Early versions MapReduce suffered from severe performance problems. Today, this becoming history. There are techniques that can be used jobs boost orders magnitude. In we teach such techniques. First, will briefly familiarize audience...

10.14778/2367502.2367562 article EN Proceedings of the VLDB Endowment 2012-08-01

Blurring the Lines between Blockchains and Database Systems

OPENALEX - Publications

Ankur Sharma Felix Schuhknecht Divya Agrawal Jens Dittrich

Within the last few years, a countless number of blockchain systems have emerged on market, each one claiming to revolutionize way distributed transaction processing in or other. Many features, such as byzantine fault tolerance, are indeed valuable additions modern environments. However, despite all hype around technology, many challenges that face fundamental management problems. These largely shared with traditional database systems, which been for decades already. similarities become...

10.1145/3299869.3319883 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Trojan data layouts

OPENALEX - Publications

Alekh Jindal Jorge-Arnulfo Quiané-Ruiz Jens Dittrich

MapReduce is becoming ubiquitous in large-scale data analysis. Several recent works have shown that the performance of Hadoop could be improved, for instance, by creating indexes a non-invasive manner. However, they ignore impact layout used inside blocks Distributed File System (HDFS). In this paper, we analyze different layouts detail context and argue Row, Column, PAX can lead to poor system performance. We propose new layout, coined Trojan Layout, internally organizes into attribute...

10.1145/2038916.2038937 article EN 2011-10-26

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory

OPENALEX - Publications

Stefan Schuh Xiao Chen Jens Dittrich

Relational equi-joins are at the heart of almost every query plan. They have been studied, improved, and reexamined on a regular basis since existence database community. In past four years several new join algorithms proposed experimentally evaluated. Some those papers contradict each other in their experimental findings. This makes it surprisingly hard to answer very simple question: what is fastest algorithm 2015? this paper we will try develop an answer. We start with end-to-end black...

10.1145/2882903.2882917 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

Only aggressive elephants are fast elephants

OPENALEX - Publications

Jens Dittrich Jorge-Arnulfo Quiané-Ruiz Stefan Richter Stefan Schuh Alekh Jindal and 1 more

Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained yellow only parts of the responding. However, teaching time make do high. So high lessons often not pay off. We take a different approach. aggressive; this will them very fast. propose HAIL (Hadoop Aggressive Indexing Library), enhancement HDFS and Hadoop MapReduce dramatically improves runtimes several classes jobs. changes...

10.14778/2350229.2350272 article EN Proceedings of the VLDB Endowment 2012-07-01

A seven-dimensional analysis of hashing methods and its implications on query processing

OPENALEX - Publications

Stefan Richter Víctor Álvarez Jens Dittrich

Hashing is a solved problem. It allows us to get constant time access for lookups. also simple. safe use an arbitrary method as black box and expect good performance, optimizations hashing can only improve it by negligible delta. Why are all of the previous statements plain wrong? That what this paper about. In we thoroughly study integer keys carefully analyze most common methods in five-dimensional requirements space: (1) data-distribution, (2) load factor, (3) dataset size, (4)...

10.14778/2850583.2850585 article EN Proceedings of the VLDB Endowment 2015-11-01

RAFTing MapReduce: Fast recovery on the RAFT

OPENALEX - Publications

Jorge-Arnulfo Quiané-Ruiz Christoph Pinkel Jörg Schad Jens Dittrich

MapReduce is a computing paradigm that has gained lot of popularity as it allows non-expert users to easily run complex analytical tasks at very large-scale. At such scale, task and node failures are no longer an exception but rather characteristic large-scale systems. This makes fault-tolerance critical issue for the efficient operation any application. automatically reschedules failed available nodes, which in turn recompute from scratch. However, this policy can significantly decrease...

10.1109/icde.2011.5767877 article EN 2011-04-01

The uncracked pieces in database cracking

OPENALEX - Publications

Felix Schuhknecht Alekh Jindal Jens Dittrich

Database cracking has been an area of active research in recent years. The core idea database is to create indexes adaptively and incrementally as a side-product query processing. Several works have proposed different techniques for aspects including updates, tuple-reconstruction, convergence, concurrency-control, robustness. However, there lack any comparative study these methods by independent group. In this paper, we conduct experimental on cracking. Our goal critically review several...

10.14778/2732228.2732229 article EN Proceedings of the VLDB Endowment 2013-10-01

Towards zero-overhead static and adaptive indexing in Hadoop

OPENALEX - Publications

Stefan Richter Jorge-Arnulfo Quiané-Ruiz Stefan Schuh Jens Dittrich

10.1007/s00778-013-0332-z article EN The VLDB Journal 2013-09-25

Data redundancy and duplicate detection in spatial join processing

OPENALEX - Publications

Jens Dittrich Bernhard Seeger

The partition-based spatial-merge join (PBSM) of J.M. Patel and D.J. DeWitt (1996) the size separation spatial (S/sup 3/J) N. Koudas K.C. Sevcik (1997) are considered to be among most efficient methods for processing (intersection) joins on two or more relations. Neither method assumes presence pre-existing indices In this paper, we propose several improvements these algorithms. particular, deal with impact data redundancy duplicate detection performance methods. For PBSM, present a simple...

10.1109/icde.2000.839452 article EN 2002-11-07

The Case for Automatic Database Administration using Deep Reinforcement Learning

OPENALEX - Publications

Ankur Sharma Felix Schuhknecht Jens Dittrich

Like any large software system, a full-fledged DBMS offers an overwhelming amount of configuration knobs. These range from static initialisation parameters like buffer sizes, degree concurrency, or level replication to complex runtime decisions creating secondary index on particular column reorganising the physical layout store. To simplify configuration, industry grade DBMSs are usually shipped with various advisory tools, that provide recommendations for given workloads and machines....

10.48550/arxiv.1801.05643 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A comparison of adaptive radix trees and hash tables

OPENALEX - Publications

Víctor Álvarez Stefan Richter Xiao Chen Jens Dittrich

With prices of main memory constantly decreasing, people nowadays are more interested in performing their computations memory, and leave high I/O costs traditional disk-based systems out the equation. This change paradigm, however, represents new challenges to way data should be stored indexed order processed efficiently. Traditional structures, like venerable B-tree, were designed work on systems, but they no longer go main-memory at least not original form, due poor cache utilization run...

10.1109/icde.2015.7113370 article EN 2015-04-01

A critical analysis of recursive model indexes

OPENALEX - Publications

Marcel Maltry Jens Dittrich

The recursive model index (RMI) has recently been introduced as a machine-learned replacement for traditional indexes over sorted data, achieving remarkably fast lookups. Follow-up work focused on explaining RMI's performance and automatically configuring RMIs through enumeration. Unfortunately, involves setting several hyperparameters, the enumeration of which is often too time-consuming in practice. Therefore, this work, we conduct first inventor-independent broad analysis with goal...

10.14778/3510397.3510405 article EN Proceedings of the VLDB Endowment 2022-01-01

The repeatability experiment of SIGMOD 2008

OPENALEX - Publications

Ioana Manolescu Loredana Afanasiev Andrei Arion Jens Dittrich Stefan Manegold and 5 more

SIGMOD 2008 was the first database conference that offered to test submitters' programs against their data verify experiments published. This paper discusses rationale for this effort, community's reaction, our experiences, and advice future similar efforts.

10.1145/1374780.1374791 article EN ACM SIGMOD Record 2008-03-01

A comparison of knives for bread slicing

OPENALEX - Publications

Alekh Jindal Endre Palatinus Vladimir Pavlov Jens Dittrich

Vertical partitioning is a crucial step in physical database design row-oriented databases. A number of vertical algorithms have been proposed over the last three decades for variety niche scenarios. In principle, underlying problem remains same: decompose table into one or more partitions. However, it not clear how good different are comparison to each other. fact, even experimentally compare algorithms. this paper, we present an exhaustive experimental study several We categorize along...

10.14778/2536336.2536338 article EN Proceedings of the VLDB Endowment 2013-04-01

An experimental evaluation and analysis of database cracking

OPENALEX - Publications

Felix Schuhknecht Alekh Jindal Jens Dittrich

10.1007/s00778-015-0397-y article EN The VLDB Journal 2015-08-21

Main memory adaptive indexing for multi-core systems

OPENALEX - Publications

Víctor Álvarez Felix Schuhknecht Jens Dittrich Stefan Richter

Adaptive indexing is a concept that considers index creation in databases as by-product of query processing; opposed to traditional full where the effort performed up front before answering any queries. has received considerable amount attention, and several algorithms have been proposed over past few years; including recent experimental study comparing large number existing methods. Until now, however, most adaptive designed single-threaded, yet with multi-core systems already well...

10.1145/2619228.2619231 article EN 2014-06-23

RUMA has it

OPENALEX - Publications

Felix Schuhknecht Jens Dittrich Ankur Sharma

Memory management is one of the most boring topics in database research. It plays a minor role tasks like free-space or efficient space usage. Here and there we also realize its impact on performance when worrying about NUMA-aware memory allocation, data compacting, snapshotting, defragmentation. But, overall, let's face it: entire topic sounds as exciting 'garbage collection' 'debugging program for leaks'. What if were technique that would promote from third class helper thingie to first...

10.14778/2977797.2977803 article EN Proceedings of the VLDB Endowment 2016-06-01

MOVIES: indexing moving objects by shooting index images

OPENALEX - Publications

Jens Dittrich Lukas Blunschi Marcos Antonio Vaz Salles

10.1007/s10707-011-0122-y article EN GeoInformatica 2011-02-02

Coming Soon ...