NFDI4DS | UHH-SEMS - Publication Details

Matthias Böehm

ORCID: 0000-0003-1344-3663

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5053153176

Research Areas

Parallel Computing and Optimization Techniques
Advanced Database Systems and Queries
Cloud Computing and Resource Management
Machine Learning and Data Classification
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Graph Theory and Algorithms
Business Process Modeling and Analysis
Service-Oriented Architecture and Web Services
Scientific Computing and Data Management
Data Quality and Management
Stochastic Gradient Optimization Techniques
Software System Performance and Reliability
Machine Learning and Algorithms
Digital Innovation in Industries
Machine Learning in Materials Science
Software Engineering Research
Time Series Analysis and Forecasting
Explainable Artificial Intelligence (XAI)
Information Technology Governance and Strategy
Data Stream Mining Techniques
Scheduling and Optimization Algorithms
ERP Systems Implementation and Impact
Stock Market Forecasting Methods
Data Mining Algorithms and Applications

Technische Universität Berlin
2023-2025

Graz University of Technology
2019-2022

IBM Research - Almaden
2013-2019

IBM (United States)
2014-2017

Technische Universität Dresden
2008-2014

Osnabrück University
2011-2013

University of Münster
2010

Hochschule für Technik und Wirtschaft Dresden – University of Applied Sciences
2008-2009

Looking beyond the rim of one's teacup: a multidisciplinary literature review of Product-Service Systems in Information Systems, Business Management, and Engineering & Design

OPENALEX - Publications

Matthias Böehm Oliver Thomas

10.1016/j.jclepro.2013.01.019 article EN Journal of Cleaner Production 2013-02-05

SystemML

OPENALEX - Publications

Matthias Böehm Michael W. Dusenberry Deron Eriksson Alexandre Evfimievski Faraz Makari Manshadi and 6 more

The rising need for custom machine learning (ML) algorithms and the growing data sizes that require exploitation of distributed, data-parallel frameworks such as MapReduce or Spark, pose significant productivity challenges to scientists. Apache SystemML addresses these through declarative ML by (1) increasing scientists they are able express in a familiar domain-specific language covering linear algebra primitives statistical functions, (2) transparently running on applying cost-based...

10.14778/3007263.3007279 article EN Proceedings of the VLDB Endowment 2016-09-01

Data Management in Machine Learning

OPENALEX - Publications

Arun Kumar Matthias Böehm Jun Yang

Large-scale data analytics using statistical machine learning (ML), popularly called advanced analytics, underpins many modern data-driven applications. The management community has been working for over a decade on tackling management-related challenges that arise in ML workloads, and built several systems analytics. This tutorial provides comprehensive review of such analyzes key techniques. We focus three complementary lines work: (1) integrating algorithms languages with existing as...

10.1145/3035918.3054775 article EN 2017-05-09

Hybrid parallelization strategies for large-scale machine learning in SystemML

OPENALEX - Publications

Matthias Böehm Shirish Tatikonda Berthold Reinwald Prithviraj Sen Yuanyuan Tian and 2 more

SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs MR jobs. The declarative specification algorithms enables---in contrast existing libraries---automatic optimization. SystemML's primary focus is data parallelism but many inherently exhibit opportunities for task as well. A major challenge how efficiently combine both types arbitrary and workloads. In this paper, we present a systematic...

10.14778/2732286.2732292 article EN Proceedings of the VLDB Endowment 2014-03-01

Compressed linear algebra for large-scale machine learning

OPENALEX - Publications

Ahmed Elgohary Matthias Böehm Peter J. Haas Frederick Reiss Berthold Reinwald

Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge an optimal model. It is crucial for performance fit the into single-node or distributed main memory. General-purpose, heavy- lightweight compression techniques struggle achieve both good ratios fast decompression speed enable block-wise uncompressed operations. Hence, we initiate work on compressed linear algebra (CLA), in which...

10.14778/2994509.2994515 article EN Proceedings of the VLDB Endowment 2016-08-01

Resource Elasticity for Large-Scale Machine Learning

OPENALEX - Publications

Botong Huang Matthias Böehm Yuanyuan Tian Berthold Reinwald Shirish Tatikonda and 1 more

Declarative large-scale machine learning (ML) aims at flexible specification of ML algorithms and automatic generation hybrid runtime plans ranging from single node, in-memory computations to distributed on MapReduce (MR) or similar frameworks. State-of-the-art compilers in this context are very sensitive memory constraints the master process MR cluster configuration. Different configurations can lead significant performance differences. Interestingly, resource negotiation frameworks like...

10.1145/2723372.2749432 article EN 2015-05-27

On optimizing operator fusion plans for large-scale machine learning in systemML

OPENALEX - Publications

Matthias Böehm Berthold Reinwald David Hutchison Prithviraj Sen Alexandre Evfimievski and 1 more

Many machine learning (ML) systems allow the specification of ML algorithms by means linear algebra programs, and automatically generate efficient execution plans. The opportunities for fused operators---in terms chains basic operators---are ubiquitous, include fewer materialized intermediates, scans inputs, sparsity exploitation across operators. However, existing fusion heuristics struggle to find good plans complex operator DAGs or hybrid local distributed operations. In this paper, we...

10.14778/3229863.3229865 article EN Proceedings of the VLDB Endowment 2018-08-01

Data management in the MIRABEL smart grid system

OPENALEX - Publications

Matthias Böehm Lars Dannecker Andreas Doms Erik Dovgan Bogdan Filipič and 6 more

Nowadays, Renewable Energy Sources (RES) are attracting more and interest. Thus, many countries aim to increase the share of green energy have face with several challenges (e.g., balancing, storage, pricing). In this paper, we address balancing challenge present MIRABEL project which aims prototype an Data Management System (EDMS) takes benefit flexibilities efficiently balance demand supply. The EDMS consists millions heterogeneous nodes that each incorporates advanced components...

10.1145/2320765.2320797 article EN 2012-03-30

SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging

OPENALEX - Publications

Svetlana Sagadeeva Matthias Böehm

Slice finding---a recent work on debugging machine learning (ML) models---aims to find the top-K data slices (e.g., conjunctions of predicates such as gender female and degree PhD), where a trained model performs significantly worse than entire training/test data. These may be used acquire more for problematic subset, add rules, or otherwise improve model. In contrast decision trees, general slice finding problem allows overlapping slices. The resulting search space is huge it covers all...

10.1145/3448016.3457323 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Dependency-based IT Governance practices in inter-organisational collaborations: A graph-driven elaboration

OPENALEX - Publications

Novica Zarvić Carl Stolze Matthias Böehm Oliver Thomas

10.1016/j.ijinfomgt.2012.03.004 article EN International Journal of Information Management 2012-04-26

On optimizing machine learning workloads via kernel fusion

OPENALEX - Publications

Arash Ashari Shirish Tatikonda Matthias Böehm Berthold Reinwald Keith Campbell and 2 more

Exploitation of parallel architectures has become critical to scalable machine learning (ML). Since a wide range ML algorithms employ linear algebraic operators, GPUs with BLAS libraries are natural choice for such an exploitation. Two approaches commonly pursued: (i) developing specific GPU accelerated implementations complete algorithms; and (ii) kernels primitive operators like matrix-vector multiplication, which then used in algorithms. This paper extends the latter approach by fused...

10.1145/2688500.2688521 article EN 2015-01-24

CAMEO: Autocorrelation-Preserving Line Simplification for Lossy Time Series Compression

OPENALEX - Publications

Carlos Enrique Muñiz-Cuza Matthias Böehm Torben Bach Pedersen

Time series data from a variety of sensors and IoT devices need effective compression to reduce storage I/O bandwidth requirements. While most time databases systems rely on lossless compression, lossy techniques offer even greater space-saving with small loss in precision. However, the unknown impact downstream analytics applications requires semi-manual trial-and-error exploration. We initiate work that provides guarantees complex statistical features (which are strongly correlated...

10.48550/arxiv.2501.14432 preprint EN arXiv (Cornell University) 2025-01-24

Technical Perspective: TASHEEH: Repairing Row-Structure in Raw CSV Files

OPENALEX - Publications

Matthias Böehm

Open science and data exchange in general rely on standardized interoperable file formats. Comma-separated value (CSV) files are probably the most versatile, simplest, widely-used format for tabular data. For example, FAIR principles of research management promote findable, accessible, interoperable, reusable metadata. In this context, CSV ensure accessibility interoperability because its simple structure textbased format, making them amenable long-term storage. An analysis by Google Dataset...

10.1145/3733620.3733638 article EN ACM SIGMOD Record 2025-04-28

Towards Integrated Data Analytics: Time Series Forecasting in DBMS

OPENALEX - Publications

Ulrike Fischer Lars Dannecker Laurynas Šikšnys Frank Rosenthal Matthias Böehm and 1 more

10.1007/s13222-012-0108-4 article EN Datenbank-Spektrum 2012-10-19

Compressed linear algebra for large-scale machine learning

OPENALEX - Publications

Ahmed Elgohary Matthias Böehm Peter J. Haas Frederick Reiss Berthold Reinwald

10.1007/s00778-017-0478-1 article EN The VLDB Journal 2017-09-12

Data Management in Machine Learning Systems

OPENALEX - Publications

Matthias Böehm Arun Kumar Jun Yang

Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these workloads in an efficient scala

10.2200/s00895ed1v01y201901dtm057 article EN Synthesis lectures on data management 2019-02-25

On optimizing machine learning workloads via kernel fusion

OPENALEX - Publications

Arash Ashari Shirish Tatikonda Matthias Böehm Berthold Reinwald Keith Campbell and 2 more

10.1145/2858788.2688521 article EN ACM SIGPLAN Notices 2015-01-24

Compressed linear algebra for declarative large-scale machine learning

OPENALEX - Publications

Ahmed Elgohary Matthias Böehm Peter J. Haas Frederick Reiss Berthold Reinwald

Large-scale Machine Learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications. Hence, it is crucial for performance to fit the into single-node or distributed main memory enable fast operations. General-purpose compression struggles achieve both good ratios decompression block-wise uncompressed Therefore, we introduce Compressed Linear Algebra (CLA) lossless matrix compression. CLA encodes matrices with lightweight,...

10.1145/3318221 article EN Communications of the ACM 2019-04-24

LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems

OPENALEX - Publications

Arnab Phani Benjamin Rath Matthias Böehm

Machine learning (ML) and data science workflows are inherently exploratory. Data scientists pose hypotheses, integrate the necessary data, run ML pipelines of cleaning, feature engineering, model selection hyper-parameter tuning. The repetitive nature these workflows, their hierarchical composition from building blocks exhibits high computational redundancy. Existing work addresses this redundancy with coarse-grained lineage tracing reuse for pipelines. This approach allows using existing...

10.1145/3448016.3452788 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Cost-based vectorization of instance-based integration processes

OPENALEX - Publications

Matthias Böehm Dirk Habich Steffen Preißler Wolfgang Lehner Uwe Wloka

10.1016/j.is.2010.06.007 article EN Information Systems 2010-07-09

Declarative Machine Learning - A Classification of Basic Properties and Types

OPENALEX - Publications

Matthias Böehm Alexandre Evfimievski Niketan Pansare Berthold Reinwald

Declarative machine learning (ML) aims at the high-level specification of ML tasks or algorithms, and automatic generation optimized execution plans from these specifications. The fundamental goal is to simplify usage and/or development which especially important in context large-scale computations. However, systems different abstraction levels have emerged over time accordingly there has been a controversy about meaning this general definition declarative ML. Specification alternatives...

10.48550/arxiv.1605.05826 preprint EN other-oa arXiv (Cornell University) 2016-01-01

ExDRa: Exploratory Data Science on Federated Raw Data

OPENALEX - Publications

Sebastian Baunsgaard Matthias Böehm Ankit Chaudhary Behrouz Derakhshan S. Geißelsöder and 12 more

Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, pre-processing of data. This lack infrastructure results redundant manual effort computation. Furthermore, central data consolidation not always technically or economically desirable even feasible (e.g., due to privacy, and/or ownership). The ExDRa system aims provide for this...

10.1145/3448016.3457549 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

MNC

OPENALEX - Publications

Johanna Sommer Matthias Böehm Alexandre Evfimievski Berthold Reinwald Peter J. Haas

Efficiently computing linear algebra expressions is central to machine learning (ML) systems. Most systems support sparse formats and operations because matrices are ubiquitous their dense representation can cause prohibitive overheads. Estimating the sparsity of intermediates, however, remains a key challenge when generating execution plans or performing operations. These estimates used for cost memory estimates, format decisions, result allocation. Existing estimators tend focus on matrix...

10.1145/3299869.3319854 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Coming Soon ...