NFDI4DS | UHH-SEMS - Publication Details

Developing and Qualifying an ML Application for MRO Assistance

OPENALEX - Publications

Helena Ebel Gerald Kremer Morgan K. Geldenhuys Oksana Rasskazova Robert S. Kern and 1 more

Abstract This study presents a framework for integrating and qualifying Machine Learning (ML) in Maintenance, Repair, Overhaul (MRO) processes gas turbines. Using neural networks damage detection decision trees repair estimation, it emphasizes continuous qualification aligned with ISO/IEC standards responsible AI principles. An interactive guide supports systematic ML implementation, ensuring transparency compliance Industry 4.0. Validated through two turbine blade case studies, the approach...

10.1515/zwf-2024-0133 article EN cc-by Zeitschrift für wirtschaftlichen Fabrikbetrieb 2025-03-20

Dependable IoT Data Stream Processing for Monitoring and Control of Urban Infrastructures

OPENALEX - Publications

Morgan K. Geldenhuys Jonathan Will Benjamin J. J. Pfister Martin Haug Alexander Scharmann and 1 more

The Internet of Things describes a network physical devices interacting and producing vast streams sensor data. At present there are number general challenges which exist while developing solutions for use cases involving the monitoring control urban infrastructures. These include need dependable method extracting value from these high volume time sensitive data is adaptive to changing workloads. Low-latency access current state live necessity as well ability perform queries on historical...

10.1109/ic2e52221.2021.00041 article EN 2021-10-01

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

OPENALEX - Publications

Dominik Scheinert Alexander Acker Lauritz Thamsen Morgan K. Geldenhuys Odej Kao

Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification localization anomalies in such systems supports experts enables fast mitigation. However, due to the various inter-dependencies system components, do not only affect their origin but propagate through system. Taking this into account, we present Arvalus its variant D-Arvalus,...

10.1109/cloudintelligence52565.2021.00011 preprint EN 2021-05-01

Effectively Testing System Configurations of Critical IoT Analytics Pipelines

OPENALEX - Publications

Morgan K. Geldenhuys Lauritz Thamsen Kain Kordian Gontarskay Felix Lorenz Odej Kao

The emergence of the Internet Things has seen introduction numerous connected devices used for monitoring and control even Critical Infrastructures. Distributed stream processing become key to analyzing data generated by these improving our ability make decisions. However, optimizing systems towards specific Quality Service targets is a difficult time-consuming task, due large-scale distributed involved, existence so many configuration parameters, inability easily determine impact tuning...

10.1109/bigdata47090.2019.9005504 article EN 2021 IEEE International Conference on Big Data (Big Data) 2019-12-01

Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems

OPENALEX - Publications

Benjamin J. J. Pfister Dominik Scheinert Morgan K. Geldenhuys Odej Kao

To maintain a stable Quality of Service (QoS), these systems require sufficient allocation resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling methods have been proposed that aim efficiently match with incoming workload. However, determining when by how much scale remains significant challenge. Given long-running nature DSP jobs, scaling actions need be executed at runtime, good QoS,...

10.1145/3629526.3645042 article EN cc-by 2024-05-07

Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs

OPENALEX - Publications

Morgan K. Geldenhuys Lauritz Thamsen Odej Kao

Fault tolerance is a property which needs deeper consideration when dealing with streaming jobs requiring high levels of availability and low-latency processing even in case failures where Quality-of-Service constraints must be adhered to. Typically, systems achieve fault the ability to recover automatically from partial by implementing Checkpoint Rollback Recovery. However, this an expensive operation impacts negatively on overall performance system manually optimizing for specific...

10.1109/bigdata50022.2020.9378474 article EN 2021 IEEE International Conference on Big Data (Big Data) 2020-12-10

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

OPENALEX - Publications

Morgan K. Geldenhuys Dominik Scheinert Odej Kao Lauritz Thamsen

Distributed Stream Processing systems have become an essential part of big data processing platforms. They are characterized by the high-throughput near to real-time event streams with goal delivering low-latency results and thus enabling time-sensitive decision making. At same time, expected be consistent even in presence partial failures where exactly-once guarantees required for correctness. workloads oftentimes dynamic nature which makes static configurations highly inefficient as time...

10.1109/icws55610.2022.00041 article EN 2022-07-01

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

OPENALEX - Publications

Dominik Scheinert Houkun Zhu Lauritz Thamsen Morgan K. Geldenhuys Jonathan Will and 2 more

Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, actual performance jobs depends on several factors varies over time. Yet, in many situations, dynamic scaling meet formulated targets despite significant variance.This paper presents Enel, a novel approach that uses message propagation an attributed graph model and, thus,...

10.1109/ipccc51483.2021.9679361 preprint EN 2021-10-29

A Scalable and Dependable Data Analytics Platform for Water Infrastructure Monitoring

OPENALEX - Publications

Felix Lorenz Morgan K. Geldenhuys Harald Sommer Frauke Jakobs Carsten Lüring and 3 more

With weather becoming more extreme both in terms of longer dry periods and severe rain events, municipal water networks are increasingly under pressure. The effects include damages to the pipes, flash floods on streets combined sewer overflows. Retrofitting underground infrastructure is very expensive, thus operators looking deploy IoT solutions that promise alleviate problems at a fraction cost.In this paper, we report preliminary results from an ongoing joint research project, specifically...

10.1109/bigdata50022.2020.9378138 article EN 2021 IEEE International Conference on Big Data (Big Data) 2020-12-10

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

OPENALEX - Publications

Morgan K. Geldenhuys B. Pfister Dominik Scheinert Lauritz Thamsen Odej Kao

Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access new results.As such, making timely decisions based these results is dependent a system's tolerate failure.Typically, achieve fault tolerance and the recover automatically from partial failures by implementing checkpoint rollback recovery.However, owing statistical probability occurring in distributed...

10.15439/2022f225 article EN cc-by Annals of Computer Science and Information Systems 2022-09-26

The Electronic Bee Spy: Eavesdropping on Honeybee Communication via Electrostatic Field Recordings

OPENALEX - Publications

Benjamin H. Paffhausen Julian Petrasch Uwe Greggers Aron Duer Zhengwei Wang and 10 more

As a canary in coalmine warns of dwindling breathable air, the honeybee can indicate health an ecosystem. Honeybees are most important pollinators fruit-bearing flowers, and share similar ecological niches with many other pollinators; therefore, colony reflect conditions whole The may be mirrored social signals that bees exchange during their sophisticated body movements such as waggle dance. To observe these changes, we developed automatic system records quantifies under normal beekeeping...

10.3389/fnbeh.2021.647224 article EN cc-by Frontiers in Behavioral Neuroscience 2021-04-28

Evaluation of Load Prediction Techniques for Distributed Stream Processing

OPENALEX - Publications

Kordian Gontarska Morgan K. Geldenhuys Dominik Scheinert Philipp Wiesner Andreas Polze and 1 more

Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near real time. They are an essential part many data-intensive applications and analytics platforms. The rate at which events arrive DSP can vary considerably over time, may be due trends, cyclic, seasonal patterns within the streams. A priori knowledge incoming workloads enables proactive approaches resource management optimization tasks such as dynamic scaling, live...

10.1109/ic2e52221.2021.00023 article EN 2021-10-01

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

OPENALEX - Publications

Morgan K. Geldenhuys Dominik Scheinert Odej Kao Lauritz Thamsen

Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams unbounded data. To increase capacities, DSP systems are able to dynamically scale across a cluster commodity nodes, ensuring good Quality Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains challenge. This is especially true in environments where workloads change over time and node failures all but inevitable. Furthermore,...

10.48550/arxiv.2403.02129 preprint EN arXiv (Cornell University) 2024-03-04

Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems

OPENALEX - Publications

Benjamin J. J. Pfister Dominik Scheinert Morgan K. Geldenhuys Odej Kao

Distributed Stream Processing (DSP) systems are capable of processing large streams unbounded data, offering high throughput and low latencies. To maintain a stable Quality Service (QoS), these require sufficient allocation resources. At the same time, over-provisioning can result in wasted energy operating costs. Therefore, to maximize resource utilization, autoscaling methods have been proposed that aim efficiently match with incoming workload. However, determining when by how much scale...

10.48550/arxiv.2403.02093 preprint EN arXiv (Cornell University) 2024-03-04

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

OPENALEX - Publications

Morgan K. Geldenhuys Dominik Scheinert Odej Kao Lauritz Thamsen

Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams unbounded data. To increase capacities, DSP systems are able to dynamically scale across a cluster commodity nodes, ensuring good Quality Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains challenge. This is especially true in environments where workloads change over time and node failures all but inevitable. Furthermore,...

10.1145/3629526.3645048 article EN cc-by 2024-05-07

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

OPENALEX - Publications

Dominik Scheinert Fabian Casares Morgan K. Geldenhuys Kevin Styp-Rekowski Odej Kao

Stream processing has become a critical component in the architecture of modern applications. With exponential growth data generation from sources such as Internet Things, business intelligence, and telecommunications, real-time unbounded streams necessity. DSP systems provide solution to this challenge, offering high horizontal scalability, fault-tolerant execution, ability process multiple single job. Often enough though, need be enriched with extra information for correct processing,...

10.48550/arxiv.2307.14287 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

OPENALEX - Publications

Morgan K. Geldenhuys Dominik Scheinert Odej Kao Lauritz Thamsen

Distributed Stream Processing systems have become an essential part of big data processing platforms. They are characterized by the high-throughput near to real-time event streams with goal delivering low-latency results and thus enabling time-sensitive decision making. At same time, expected be consistent even in presence partial failures where exactly-once guarantees required for correctness. workloads oftentimes dynamic nature which makes static configurations highly inefficient as time...

10.48550/arxiv.2206.09679 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Evaluation of Load Prediction Techniques for Distributed Stream Processing

OPENALEX - Publications

Kordian Gontarska Morgan K. Geldenhuys Dominik Scheinert Philipp Wiesner Andreas Polze and 1 more

Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near real time. They are an essential part many data-intensive applications and analytics platforms. The rate at which events arrive DSP can vary considerably over time, may be due trends, cyclic, seasonal patterns within the streams. A priori knowledge incoming workloads enables proactive approaches resource management optimization tasks such as dynamic scaling, live...

10.48550/arxiv.2108.04749 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

OPENALEX - Publications

Dominik Scheinert Fabian Casares Morgan K. Geldenhuys Kevin Styp-Rekowski Odej Kao

Stream processing has become a critical component in the architecture of modern applications. With exponential growth data generation from sources such as Internet Things, business intelligence, and telecommunications, real-time unbounded streams necessity. DSP systems provide solution to this challenge, offering high horizontal scalability, fault-tolerant execution, ability process multiple single job. Often enough though, need be enriched with extra information for correct processing,...

10.1109/ic2e59103.2023.00030 article EN 2023-09-25

A Scalable and Dependable Data Analytics Platform for Water Infrastructure Monitoring

OPENALEX - Publications

Felix Lorenz Morgan K. Geldenhuys Harald Sommer Frauke Jakobs Carsten Lüring and 3 more

With weather becoming more extreme both in terms of longer dry periods and severe rain events, municipal water networks are increasingly under pressure. The effects include damages to the pipes, flash floods on streets combined sewer overflows. Retrofitting underground infrastructure is very expensive, thus operators looking deploy IoT solutions that promise alleviate problems at a fraction cost. In this paper, we report preliminary results from an ongoing joint research project,...

10.48550/arxiv.2012.00400 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs

OPENALEX - Publications

Morgan K. Geldenhuys Lauritz Thamsen Odej Kao

Fault tolerance is a property which needs deeper consideration when dealing with streaming jobs requiring high levels of availability and low-latency processing even in case failures where Quality-of-Service constraints must be adhered to. Typically, systems achieve fault the ability to recover automatically from partial by implementing Checkpoint Rollback Recovery. However, this an expensive operation impacts negatively on overall performance system manually optimizing for specific...

10.48550/arxiv.2102.06170 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Effectively Testing System Configurations of Critical IoT Analytics Pipelines

OPENALEX - Publications

Morgan K. Geldenhuys Lauritz Thamsen Kain Kordian Gontarska Felix Lorenz Odej Kao

The emergence of the Internet Things has seen introduction numerous connected devices used for monitoring and control even Critical Infrastructures. Distributed stream processing become key to analyzing data generated by these improving our ability make decisions. However, optimizing systems towards specific Quality Service targets is a difficult time-consuming task, due large-scale distributed involved, existence so many configuration parameters, inability easily determine impact tuning...

10.48550/arxiv.2102.06094 preprint EN cc-by-sa arXiv (Cornell University) 2021-01-01