- Cloud Computing and Resource Management
- Advanced Database Systems and Queries
- IoT and Edge/Fog Computing
- Distributed systems and fault tolerance
- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Energy Efficient Wireless Sensor Networks
- Data Management and Algorithms
- Distributed and Parallel Computing Systems
- Graph Theory and Algorithms
- Software System Performance and Reliability
- Caching and Content Delivery
- Peer-to-Peer Network Technologies
- Data Stream Mining Techniques
- Algorithms and Data Compression
- Stochastic Gradient Optimization Techniques
- Advanced Image and Video Retrieval Techniques
- Advanced Clustering Algorithms Research
- Context-Aware Activity Recognition Systems
- Blockchain Technology Applications and Security
- Privacy-Preserving Technologies in Data
- Indoor and Outdoor Localization Technologies
- Web Data Mining and Analysis
- Network Packet Processing and Optimization
- Machine Learning and Data Classification
Technische Universität Berlin
2019-2024
German Research Centre for Artificial Intelligence
2018-2022
Humboldt-Universität zu Berlin
2014-2017
Modern Stream Processing Engines (SPEs) process large data volumes under tight latency constraints. Many SPEs execute processing pipelines using message passing on shared-nothing architectures and apply a partition-based scale-out strategy to handle high-velocity input streams. Furthermore, many state-of-the-art rely Java Virtual Machine achieve platform independence speed up system development by abstracting from the underlying hardware. In this paper, we show that taking hardware into...
GPUs have long been discussed as accelerators for database query processing because of their high power and memory bandwidth. However, two main challenges limit the utility large-scale data processing: (1) on-board capacity is too small to store large sets, yet (2) interconnect bandwidth CPU main-memory insufficient ad hoc transfers. As a result, GPU-based systems algorithms run into transfer bottleneck do not scale sets. In practice, CPUs process faster than with current technology. this...
Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity streams. Industrial setups require SPEs to sustain outages, varying rates, and low-latency processing. need transparently reconfigure stateful queries during runtime. However, state-of-the-art not ready yet handle on-the-fly reconfigurations of with terabytes state due three problems. These network overhead for migration, consistency, In this paper, we propose Rhino, a library efficient...
Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime optimizations. This limits the utilization of modern hardware neglects changing characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE, to enable highly efficient execution. We extend compilation task-based parallelization for unique requirements stream apply...
Data management systems will face several new challenges in supporting IoT applications during the coming years. These arise from managing large numbers of heterogeneous devices and require combining elastic cloud fog resources unified fog-cloud environments. In this demonstration, we introduce a smart city simulation called IoTropolis use it to create interactive eHealth Smart Grid application scenarios. We these scenarios showcase three key Furthermore, demonstrate how our recently...
The Internet of Things (IoT) presents a novel computing architecture for data management: distributed, highly dynamic, and heterogeneous environment massive scale. Applications the IoT introduce new challenges integrating concepts fog cloud as well sensor networks in one unified environment. In this paper, we highlight these major outline how existing systems handle them. To address challenges, NebulaStream platform, general purpose, endto-end management system IoT. addresses heterogeneity...
The Internet of Things (IoT) represents one the fastest emerging trends in area information and communication technology. main challenge IoT is timely gathering data streams from potentially millions sensors. In particular, those sensors are widely distributed, constantly transit, highly heterogeneous, unreliable. To gather such a dynamic environment efficiently, two techniques have emerged over last decade: adaptive sampling filtering. These dynamically reconfigure rates filter thresholds...
A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices decrease latency and improve energy efficiency. However, deploying DAFs on low-end challenging due their volatility limited resources. Additionally, environments, creating new operator instances demand replicating operators ubiquitously restricted, posing challenges for achieving load balancing without overloading devices. Existing...
Today's users of data processing systems come from different domains, have levels expertise, and prefer programming languages. As a result, analytical workload requirements shifted relational to polyglot queries involving user-defined functions (UDFs). Although some support queries, they often embed third-party language runtimes. This embedding induces high performance overhead, as it causes additional materialization between execution engines. In this paper, we present Babelfish, novel...
Database management systems are facing growing data volumes. Previous research suggests that GPUs well-equipped to quickly process joins and similar stateful operators, as feature high-bandwidth on-board memory. However, cannot scale large volumes due two limiting factors: (1)~large state does not fit into the memory, (2)~spilling main memory is constrained by interconnect bandwidth. Thus, CPUs often better choice for scalable processing.
The intra-window join (IaWJ), i.e., joining two input streams over a single window, is core operation in modern stream processing applications. This paper presents the first comprehensive study on parallelizing IaWJ multicore architectures. In particular, we classify algorithms into lazy and eager execution approaches. For each approach, there are further design aspects to consider, including different methods partitioning schemes, leading large space. Our results show that none of always...
Progressive optimization introduces robustness for database workloads against wrong estimates, skewed data, correlated attributes, or outdated statistics. Previous work focuses on cardinality estimates and rely expensive counting methods as well complex learning algorithms. In this paper, we utilize performance counters to drive progressive during query execution. The main advantages are that introduce virtually no costs modern CPUs their usage enables a non-invasive monitoring. We present...
The Internet of Things (IoT) combines large data centers with (mobile, networked) edge devices that are constrained both in compute power and energy budget. Modern contribute to query processing by leveraging accelerated units multicore CPUs or GPUs. Therefore, the IoT presents challenges 1) minimizing consumed while sustaining a given throughput, 2) increasingly complex queries within
Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, pre-processing of data. This lack infrastructure results redundant manual effort computation. Furthermore, central data consolidation not always technically or economically desirable even feasible (e.g., due to privacy, and/or ownership). The ExDRa system aims provide for this...
Join ordering and query optimization are crucial for performance but remain challenging due to unknown or changing characteristics of intermediates, especially complex queries with many joins. Over the past two decades, a spectrum techniques adaptive processing (AQP)---including inter-/intra-operator adaptivity tuple routing---have been proposed address these challenges. However, commercial database systems in practice do not implement holistic AQP because they increase system complexity...
Engineering high-performance query execution engines is a challenging task. Query compilation provides excellent performance, but at the same time introduces significant system complexity, as it makes engine hard to build, debug, and maintain. To overcome this we propose Nautilus, framework that combines ease of use interpretation performance compilation. On one hand, Nautilus an interpretation-based operator interface enables engineers implement operators using imperative C++ code ensure...
Modern processors employ sophisticated techniques such as speculative or out-of-order execution to hide memory latencies and keep their pipelines fully utilized. However, these introduce high complexity variance query processing. In particular, are transparent DBMS operations since they managed by internally. To utilize the capabilities of modern CPUs, it is necessary understand characteristics adjust operators well cost models accordingly.
Remote Direct Memory Access (RDMA) hardware has bridged the gap between network and main memory speed thus invalidated common assumption that is often bottleneck in distributed data processing systems. However, high-speed networks do not provide "plug-and-play" performance (e.g., using IP-over- InfiniBand) require a careful co-design of system application logic. As result, designers need to rethink architecture their management systems benefit from RDMA acceleration. In this paper, we focus...