- Advanced Database Systems and Queries
- Data Management and Algorithms
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Scientific Computing and Data Management
- Data Stream Mining Techniques
- Advanced Data Storage Technologies
- Graph Theory and Algorithms
- Data Quality and Management
- Distributed systems and fault tolerance
- Data Mining Algorithms and Applications
- Distributed and Parallel Computing Systems
- Machine Learning and Data Classification
- Semantic Web and Ontologies
- Data Visualization and Analytics
- Big Data and Business Intelligence
- Algorithms and Data Compression
- IoT and Edge/Fog Computing
- Software System Performance and Reliability
- Peer-to-Peer Network Technologies
- Energy Efficient Wireless Sensor Networks
- Advanced Image and Video Retrieval Techniques
- Caching and Content Delivery
- Neural Networks and Applications
- Service-Oriented Architecture and Web Services
Technische Universität Berlin
2015-2024
German Research Centre for Artificial Intelligence
2015-2023
Berlin Institute for the Foundations of Learning and Data
2023
Singapore University of Technology and Design
2022
Walter de Gruyter (Germany)
2020
Delft University of Technology
2019
University of Potsdam
2019
German Central Institute for Social Issues
2019
IBM Research - Almaden
2004-2018
DSI Informationstechnik (Germany)
2018
This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The consists of 590, 326 image patches, each which section i) 120 × pixels for 10m bands; ii) 60×60 20m and iii) 20×20 60m bands. Unlike most existing archives, patch annotated by multiple land-cover classes (i.e., multi-labels) are provided from CORINE Land Cover database year 2018 (CLC 2018). significantly larger than archives in remote sensing (RS) thus much more convenient to be used...
We present a parallel data processor centered around programming model of so called Parallelization Contracts (PACTs) and the scalable execution engine Nephele [18]. The PACT is generalization well-known map/reduce model, extending it with further second-order functions, as well Output that give guarantees about behavior function. describe methods to transform program into flow for Nephele, which executes its sequential building blocks in deals communication, synchronization fault tolerance....
The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but also cause query optimizers---which usually assume that are statistically independent---to underestimate selectivities conjunctive predicates by orders magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery correlations soft functional dependencies between columns. CORDS searches column pairs might have interesting useful relations...
Virtually every commercial query optimizer chooses the best plan for a using cost model that relies heavily on accurate cardinality estimation. Cardinality estimation errors can occur due to use of inaccurate statistics, invalid assumptions about attribute independence, parameter markers, and so on. may cause choose sub-optimal plan. We present an approach processing is extremely robust because it able detect recover from errors. call this "progressive optimization" (POP). POP validates...
The multi-core architectures of today's computer systems make parallelism a necessity for performance critical applications. Writing such applications in generic, hardware-oblivious manner is challenging problem: Current database thus rely on labor-intensive and error-prone manual tuning to exploit the full potential modern parallel hardware like CPUs graphics cards. We propose an alternative design engine, based single set operators, which are compiled down actual at runtime. This reduces...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature many analysis and machine learning algorithms, however, is still challenge current systems. While certain types bulk algorithms supported by novel frameworks, these cannot exploit computational dependencies present in such as graph algorithms. As result, inefficiently executed have led to specialized based on other paradigms, message passing or shared memory. We propose method integrate...
Database researchers paint big data as a defining challenge. To make the most of enormous opportunities at hand will require focusing on five research areas.
The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities performance characteristics. While first initiatives try compare simple workloads, there is a clear gap detailed analyses systems' In this paper, we propose framework benchmarking distributed engines. We use our suite evaluate three widely used SDPSs in detail, namely Apache Storm, Spark, Flink. Our evaluation focuses...
Modern Stream Processing Engines (SPEs) process large data volumes under tight latency constraints. Many SPEs execute processing pipelines using message passing on shared-nothing architectures and apply a partition-based scale-out strategy to handle high-velocity input streams. Furthermore, many state-of-the-art rely Java Virtual Machine achieve platform independence speed up system development by abstracting from the underlying hardware. In this paper, we show that taking hardware into...
Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, taking pulse of our planet. This article gives bird's eye view essential scientific tools approaches informing supporting transition from raw EO data to usable EO-based information. The promises, as well current challenges these developments, are highlighted under dedicated sections. Specifically, we cover impact (i) Computer vision; (ii) Machine learning; (iii) Advanced...
Increasingly large numbers of situational applications are being created by enterprise business users as a by-product solving day-to-day problems. In efforts to address the demand for such applications, corporate IT is moving toward Web 2.0 architectures. particular, intranet evolving into platform readily accessible data and services where communities can assemble deploy applications. Damia web style integration developed problem presented which often access combine from variety sources....
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization have difficulty to cope with the hard latency requirements high ingestion rates interactive visualizations. Existing solutions lowering volume disregard semantics visualizations result errors. In this work, we introduce M4, an aggregation-based dimensionality reduction technique that provides error-free...
Quickly and accurately estimating the selectivity of multidimensional predicates is a vital part modern relational query optimizer. The state-of-the art in this field are histograms, which offer good estimation quality but complex to construct hard maintain. Kernel Density Estimation (KDE) an interesting alternative that does not suffer from these problems. However, existing KDE-based estimators can hardly compete with methods.
Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization.
Approximately every five years, a group of database researchers meet to do self-assessment our community, including reflections on impact the industry as well challenges facing research community. This report summarizes discussion and conclusions 9th such meeting, held during October 9-10, 2018 in Seattle.
Accurately predicting the cardinality of intermediate plan operations is an essential part any modern relational query optimizer. The accuracy said estimates has a strong and direct impact on quality generated plans, incorrect can have negative performance. One biggest challenges in this field to predict result size join operations. Kernel Density Estimation (KDE) statistical method estimate multivariate probability distributions from data sample. Previously, we introduced modern,...
GPUs have long been discussed as accelerators for database query processing because of their high power and memory bandwidth. However, two main challenges limit the utility large-scale data processing: (1) on-board capacity is too small to store large sets, yet (2) interconnect bandwidth CPU main-memory insufficient ad hoc transfers. As a result, GPU-based systems algorithms run into transfer bottleneck do not scale sets. In practice, CPUs process faster than with current technology. this...
SQL has emerged as an industry standard for querying relational database management systems, largely because a user need only specify what data is wanted, not the details of how to access that data. A query optimizer uses mathematical model execution determine automatically best way and process any given query. This heavily dependent upon optimizer's estimates number rows will result at each step plan (QEP), especially complex queries involving many predicates and/or operations. These rely...