- Data Stream Mining Techniques
- Anomaly Detection Techniques and Applications
- Time Series Analysis and Forecasting
- Advanced Bandit Algorithms Research
- Smart Grid Energy Management
- Advanced Database Systems and Queries
- Network Security and Intrusion Detection
- Machine Learning and Data Classification
- Data Quality and Management
- Data Mining Algorithms and Applications
- Water Systems and Optimization
- Internet Traffic Analysis and Secure E-voting
- Integrated Energy Systems Optimization
- Energy Load and Power Forecasting
- Auction Theory and Applications
- Healthcare Operations and Scheduling Optimization
- Music and Audio Processing
- Fault Detection and Control Systems
- Metaheuristic Optimization Algorithms Research
- Data Management and Algorithms
- Reinforcement Learning in Robotics
- Building Energy and Comfort Optimization
- Data Visualization and Analytics
- Big Data and Business Intelligence
- Microgrid Control and Optimization
Karlsruhe Institute of Technology
2018-2024
Load disaggregation methods infer the energy consumption of individual appliances from their aggregated consumption. This facilitates savings and efficient management. However, most existing work on load has only considered household settings. may be due to companies preferring not share data, rendering such data hardly available.
The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, maker selects set of arms observes reward from each chosen arms. In this paper, we present variant problem, which call Scaling MAB (S-MAB): goal not only to maximize cumulative rewards, i.e., choosing with highest expected reward, but also decide how many select so that, expectation, cost selecting does exceed rewards. This problem...
Abstract Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring prediction systems to react, e.g., by issuing an alarm or updating a learning algorithm. However, detecting challenging observations are high-dimensional. In high-dimensional data, change detectors should not only be able identify happen, but also in which subspace they occur. Ideally, one quantify how severe are. Our approach, ABCD, has these...
Abstract The recent development of renewable energy sources (RES) challenges systems and opens many new research questions. Energy System Models (ESM) are important tools to study these problems. However, including RES into ESM strongly increases the model complexity, because one needs fluctuant, weather-dependent electricity production from with a high level granularity. This leads long execution times. To deal this issue, our objective is reduce input time series without losing their...
The increasing size of the available data and database volumes represents a real challenge for management community. In general, current approaches in mining require to be first extracted from an underlying database. From practical point view, this presents many drawbacks. short article, we present possible solution bridge gap between repositories end user analysis. We demonstrate interestingness approach with ibmdbpy, open source Python interface developed by IBM administration analytics.
Estimating dependency is a fundamental task in data management. Identifying the relevant variables leads to better understanding and improves both runtime outcome of analysis. In this paper, we propose Monte Carlo Dependency Estimation (MCDE), framework estimate multivariate dependency. MCDE quantifies as average discrepancy between marginal conditional distributions via simulations. Based on framework, present Mann-Whitney P (MWP), novel estimator. We show that MWP satisfies number...
Nowadays, it is common to classify collections of documents into (human-generated, domain-specific) directory structures, such as email or document folders. But may be classified wrongly, for a multitude reasons. Then they are outlying w.r.t. the folder end up in. Orthogonally this, and more specifically, two kinds errors can occur: (O) Out-of-distribution: does not belong any existing in directory; (M) Misclassification: belongs another folder. It this specific combination issues that we...
Abstract Estimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to better understanding and improves both runtime outcomes downstream Data Mining tasks. Dependency estimation static numerical has received much attention. However, real-world often occurs as heterogeneous streams: On one hand, collected online virtually infinite. other various components stream may be different types, e.g., numerical, ordinal or categorical. For...
We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from K arms with unknown expected rewards and costs. The goal is to maximize total reward under budget constraint. A thus seeks choose arm highest reward-cost ratio as often possible. Current approaches for this problem have several issues, which we illustrate. To overcome them, propose new upper confidence bound (UCB) sampling policy, ømega-UCB, that uses asymmetric intervals. These intervals scale...
The amount of spatial data acquired from crowdsourced platforms, mobile devices, sensors and cartographic agencies has grown exponentially over the past few years. Nearly half available currently are stored processed through large relational databases. Due to a lack generic open source tools, researchers analysts often have difficulty in extracting analyzing amounts traditional In order overcome this challenge, most effective way is perform analysis directly database, which enables quick...
We consider nonstationary multi-armed bandit problems where the model parameters of arms change over time. introduce adaptive resetting (ADR-bandit), a algorithm class that leverages windowing techniques from literature on data streams. first provide new guarantees quality estimators resulting techniques, which are independent interest. Furthermore, we conduct finite-time analysis ADR-bandit in two typical environments: an abrupt environment changes occur instantaneously and gradual...
Today, the collection of decentralized data is a common scenario: smartphones store users' messages locally, smart meters collect energy consumption data, and modern power tools monitor operator behavior. We identify different types outliers in such data: local, global, partition outliers. They contain valuable information, for example, about mistakes operation. However, existing outlier detection approaches cannot distinguish between those types. Thus, we propose "tandem" technique to join...
We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize total reward under budget constraint. A thus seeks choose arm highest reward-cost ratio as often possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, propose new upper confidence bound (UCB) sampling policy, $\omega$-UCB, that uses asymmetric intervals. These...
Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring prediction systems to react, e.g., by issuing an alarm or updating a learning algorithm. However, detecting challenging observations are high-dimensional. In high-dimensional data, change detectors should not only be able identify happen, but also in which subspace they occur. Ideally, one quantify how severe are. Our approach, ABCD, has these...
Estimating the dependency of variables is a fundamental task in data analysis. Identifying relevant attributes databases leads to better understanding and also improves performance learning algorithms, both terms runtime quality. In streams, monitoring provides key insights into underlying process, but challenging. this paper, we propose Monte Carlo Dependency Estimation (MCDE), theoretical framework estimate multivariate static dynamic data. MCDE quantifies as average discrepancy between...
Data Mining – known as the process of extracting knowledge from massive data sets leads to phenomenal impacts on our society, and now affects nearly every aspect lives: layout in local grocery store, ads product recommendations we receive, availability treatments for common diseases, prevention crime, or efficiency industrial production processes. However, remains difficult when (1) is high-dimensional, i.e., has many attributes, (2) comes a stream. Extracting high-dimensional streams...
Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., predictive maintenance, fraud detection, or medicine. A principled approach to detect compare the distributions observations within stream each other via hypothesis testing. Maximum mean discrepancy (MMD; also called energy distance) a well-known (semi-)metric on space probability distributions. MMD gives rise powerful non-parametric two-sample tests kernel-enriched domains under mild...