Volker Markl

ORCID: 0009-0009-0964-026X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Cloud Computing and Resource Management
  • Parallel Computing and Optimization Techniques
  • Scientific Computing and Data Management
  • Data Stream Mining Techniques
  • Advanced Data Storage Technologies
  • Graph Theory and Algorithms
  • Data Quality and Management
  • Distributed systems and fault tolerance
  • Data Mining Algorithms and Applications
  • Distributed and Parallel Computing Systems
  • Machine Learning and Data Classification
  • Semantic Web and Ontologies
  • Data Visualization and Analytics
  • Big Data and Business Intelligence
  • Algorithms and Data Compression
  • IoT and Edge/Fog Computing
  • Software System Performance and Reliability
  • Peer-to-Peer Network Technologies
  • Energy Efficient Wireless Sensor Networks
  • Advanced Image and Video Retrieval Techniques
  • Caching and Content Delivery
  • Neural Networks and Applications
  • Service-Oriented Architecture and Web Services

Technische Universität Berlin
2015-2024

German Research Centre for Artificial Intelligence
2015-2023

Berlin Institute for the Foundations of Learning and Data
2023

Singapore University of Technology and Design
2022

Walter de Gruyter (Germany)
2020

Delft University of Technology
2019

University of Potsdam
2019

German Central Institute for Social Issues
2019

IBM Research - Almaden
2004-2018

DSI Informationstechnik (Germany)
2018

This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The consists of 590, 326 image patches, each which section i) 120 × pixels for 10m bands; ii) 60×60 20m and iii) 20×20 60m bands. Unlike most existing archives, patch annotated by multiple land-cover classes (i.e., multi-labels) are provided from CORINE Land Cover database year 2018 (CLC 2018). significantly larger than archives in remote sensing (RS) thus much more convenient to be used...

10.1109/igarss.2019.8900532 article EN IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium 2019-07-01

We present a parallel data processor centered around programming model of so called Parallelization Contracts (PACTs) and the scalable execution engine Nephele [18]. The PACT is generalization well-known map/reduce model, extending it with further second-order functions, as well Output that give guarantees about behavior function. describe methods to transform program into flow for Nephele, which executes its sequential building blocks in deals communication, synchronization fault tolerance....

10.1145/1807128.1807148 article EN 2010-06-10

The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but also cause query optimizers---which usually assume that are statistically independent---to underestimate selectivities conjunctive predicates by orders magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery correlations soft functional dependencies between columns. CORDS searches column pairs might have interesting useful relations...

10.1145/1007568.1007641 article EN 2004-06-13

Virtually every commercial query optimizer chooses the best plan for a using cost model that relies heavily on accurate cardinality estimation. Cardinality estimation errors can occur due to use of inaccurate statistics, invalid assumptions about attribute independence, parameter markers, and so on. may cause choose sub-optimal plan. We present an approach processing is extremely robust because it able detect recover from errors. call this "progressive optimization" (POP). POP validates...

10.1145/1007568.1007642 article EN 2004-06-13

The multi-core architectures of today's computer systems make parallelism a necessity for performance critical applications. Writing such applications in generic, hardware-oblivious manner is challenging problem: Current database thus rely on labor-intensive and error-prone manual tuning to exploit the full potential modern parallel hardware like CPUs graphics cards. We propose an alternative design engine, based single set operators, which are compiled down actual at runtime. This reduces...

10.14778/2536360.2536370 article EN Proceedings of the VLDB Endowment 2013-07-01

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature many analysis and machine learning algorithms, however, is still challenge current systems. While certain types bulk algorithms supported by novel frameworks, these cannot exploit computational dependencies present in such as graph algorithms. As result, inefficiently executed have led to specialized based on other paradigms, message passing or shared memory. We propose method integrate...

10.14778/2350229.2350245 article EN Proceedings of the VLDB Endowment 2012-07-01

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities performance characteristics. While first initiatives try compare simple workloads, there is a clear gap detailed analyses systems' In this paper, we propose framework benchmarking distributed engines. We use our suite evaluate three widely used SDPSs in detail, namely Apache Storm, Spark, Flink. Our evaluation focuses...

10.1109/icde.2018.00169 preprint EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2018-04-01

Modern Stream Processing Engines (SPEs) process large data volumes under tight latency constraints. Many SPEs execute processing pipelines using message passing on shared-nothing architectures and apply a partition-based scale-out strategy to handle high-velocity input streams. Furthermore, many state-of-the-art rely Java Virtual Machine achieve platform independence speed up system development by abstracting from the underlying hardware. In this paper, we show that taking hardware into...

10.14778/3303753.3303758 article EN Proceedings of the VLDB Endowment 2019-01-01

Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, taking pulse of our planet. This article gives bird's eye view essential scientific tools approaches informing supporting transition from raw EO data to usable EO-based information. The promises, as well current challenges these developments, are highlighted under dedicated sections. Specifically, we cover impact (i) Computer vision; (ii) Machine learning; (iii) Advanced...

10.48550/arxiv.2305.08413 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Increasingly large numbers of situational applications are being created by enterprise business users as a by-product solving day-to-day problems. In efforts to address the demand for such applications, corporate IT is moving toward Web 2.0 architectures. particular, intranet evolving into platform readily accessible data and services where communities can assemble deploy applications. Damia web style integration developed problem presented which often access combine from variety sources....

10.1145/1376616.1376734 article EN 2008-06-09

Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization have difficulty to cope with the hard latency requirements high ingestion rates interactive visualizations. Existing solutions lowering volume disregard semantics visualizations result errors. In this work, we introduce M4, an aggregation-based dimensionality reduction technique that provides error-free...

10.14778/2732951.2732953 article EN Proceedings of the VLDB Endowment 2014-06-01

Quickly and accurately estimating the selectivity of multidimensional predicates is a vital part modern relational query optimizer. The state-of-the art in this field are histograms, which offer good estimation quality but complex to construct hard maintain. Kernel Density Estimation (KDE) an interesting alternative that does not suffer from these problems. However, existing KDE-based estimators can hardly compete with methods.

10.1145/2723372.2749438 article EN 2015-05-27

Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization.

10.1145/3183713.3183734 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Approximately every five years, a group of database researchers meet to do self-assessment our community, including reflections on impact the industry as well challenges facing research community. This report summarizes discussion and conclusions 9th such meeting, held during October 9-10, 2018 in Seattle.

10.1145/3385658.3385668 article EN ACM SIGMOD Record 2020-02-25

Accurately predicting the cardinality of intermediate plan operations is an essential part any modern relational query optimizer. The accuracy said estimates has a strong and direct impact on quality generated plans, incorrect can have negative performance. One biggest challenges in this field to predict result size join operations. Kernel Density Estimation (KDE) statistical method estimate multivariate probability distributions from data sample. Previously, we introduced modern,...

10.14778/3151106.3151112 article EN Proceedings of the VLDB Endowment 2017-09-01

GPUs have long been discussed as accelerators for database query processing because of their high power and memory bandwidth. However, two main challenges limit the utility large-scale data processing: (1) on-board capacity is too small to store large sets, yet (2) interconnect bandwidth CPU main-memory insufficient ad hoc transfers. As a result, GPU-based systems algorithms run into transfer bottleneck do not scale sets. In practice, CPUs process faster than with current technology. this...

10.1145/3318464.3389705 article EN 2020-05-29

SQL has emerged as an industry standard for querying relational database management systems, largely because a user need only specify what data is wanted, not the details of how to access that data. A query optimizer uses mathematical model execution determine automatically best way and process any given query. This heavily dependent upon optimizer's estimates number rows will result at each step plan (QEP), especially complex queries involving many predicates and/or operations. These rely...

10.1147/sj.421.0098 article EN IBM Systems Journal 2003-01-01
Coming Soon ...