NFDI4DS | UHH-SEMS - Publication Details

David Böhme

ORCID: 0000-0002-4159-1519

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5008218695

Research Areas

Parallel Computing and Optimization Techniques
Distributed systems and fault tolerance
Gene Regulatory Network Analysis
Advanced Data Storage Technologies
Software System Performance and Reliability
Interconnection Networks and Systems
Evolutionary Algorithms and Applications
Cloud Computing and Resource Management
Distributed and Parallel Computing Systems
Cellular Automata and Applications
CCD and CMOS Imaging Sensors
Welding Techniques and Residual Stresses
Remote Sensing and LiDAR Applications
Atmospheric aerosols and clouds
Advanced Software Engineering Methodologies
Remote Sensing in Agriculture
Electromagnetic Scattering and Analysis
Advanced Welding Techniques Analysis
Real-Time Systems Scheduling
African history and culture analysis
Context-Aware Activity Recognition Systems
Neural Networks and Applications
Music Technology and Sound Studies
Software Engineering Research
Quality and Management Systems

Lawrence Livermore National Laboratory
2015-2020

Forschungszentrum Jülich
2009-2018

German Research School for Simulation Sciences
2012-2014

RWTH Aachen University
2010-2013

Institute for Advanced Study
2010

University of Potsdam
2008

Welding Institute (Slovenia)
1998

Scalable Critical-Path Based Performance Analysis

OPENALEX - Publications

David Böhme Felix Wolf Bronis R. de Supinski Martin Schulz Markus Geimer

The critical path, which describes the longest execution sequence without wait states in a parallel program, identifies activities that determine overall program runtime. Combining knowledge of path with traditional profiles, we have defined set compact performance indicators help answer variety important performance-analysis questions, such as identifying load imbalance, quantifying impact imbalance on runtime, and characterizing resource consumption. By replaying event traces parallel, can...

10.1109/ipdps.2012.120 article EN 2012-05-01

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

OPENALEX - Publications

David Böhme Markus Geimer Felix Wolf Lukas Arnold

Driven by growing application requirements and accelerated current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes taking advantage available parallelism, as delays single processes may spread wait states across entire machine. Moreover, when employing complex point-to-point patterns, propagate along far-reaching cause-effect chains that are hard...

10.1109/icpp.2010.18 article EN 2010-09-01

Predicting the performance impact of different fat-tree configurations

OPENALEX - Publications

Nikhil Jain Abhinav Bhatelé Louis H. Howell David Böhme Ian Karlin and 5 more

The fat-tree topology is one of the most commonly used network topologies in HPC systems. Vendors support several options that can be configured when deploying networks on production systems, such as link bandwidth, number rails, planes, and tapering. This paper showcases use simulations to compare impact these design representative applications, libraries, multi-job workloads. We present advances TraceR-CODES simulation framework enable this analysis evaluate its prediction accuracy against...

10.1145/3126908.3126967 article EN 2017-11-08

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

OPENALEX - Publications

David Böhme Markus Geimer Lukas Arnold Felix Voigtlaender Felix Wolf

10.1145/2934661 article EN ACM Transactions on Parallel Computing 2016-07-20

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET

OPENALEX - Publications

Brian J. N. Wylie Markus Geimer Bernd Mohr David Böhme Zoltán Szebenyi and 1 more

Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer relying on applications being able exploit very large configurations of processor cores, associated analysis tools must also scale commensurately isolate quantify performance issues that manifest at the largest scales. In studying scalability Scalasca toolset several hundred thousand MPI processes XT5 BG/P systems, we investigated a progressive execution deterioration well-known ASCI...

10.1142/s0129626410000314 article EN Parallel Processing Letters 2010-12-01

Parallel software for retrieval of aerosol distribution from LIDAR data in the framework of EARLINET-ASOS

OPENALEX - Publications

Lukas Osterloh Carlos Pérez García‐Pando David Böhme J. M. Baldasano Christine Böckmann and 2 more

10.1016/j.cpc.2009.06.011 article EN Computer Physics Communications 2009-06-17

Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications

OPENALEX - Publications

David Böhme Felix Wolf Markus Geimer

Load or communication imbalance prevents many codes from taking advantage of the parallelism available on modern supercomputers. We present two scalable methods to highlight in parallel programs: The first method identifies delays that inflict wait states at subsequent synchronization points, and attributes their costs terms resource waste original cause. second combines knowledge critical path with traditional profiles derive a set compact performance indicators help answer variety...

10.1109/ipdpsw.2012.321 article EN 2012-05-01

Recovering logical structure from Charm++ event traces

OPENALEX - Publications

Katherine E. Isaacs Abhinav Bhatelé Jonathan Lifflander David Böhme Todd Gamblin and 3 more

Asynchrony and non-determinism in Charm++ programs present a significant challenge analyzing their event traces. We new framework to organize traces of parallel written Charm++. Our reorganization allows one more easily explore analyze such by providing context through logical structure. describe several heuristics compensate for missing dependencies between events that currently cannot be recorded. introduce task ordering recovers structure from the non-deterministic execution order. Using...

10.1145/2807591.2807634 article EN 2015-10-27

Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset

OPENALEX - Publications

Brian J. N. Wylie David Böhme Bernd Mohr Zoltán Szebenyi Felix Wolf

In studying the scalability of Scalasca performance analysis toolset to several hundred thousand MPI processes on IBM Blue Gene/P, we investigated a progressive execution deterioration well-known ASCI Sweep3D compact application. runtime summarization quantified communication time that correlated wth computational imbalance, and automated trace confirmed growing amounts waiting times. Further instrumentation, measurement analyses pinpointed conditional section highly imbalanced computation...

10.1109/ipdpsw.2010.5470816 article EN 2010-04-01

Understanding the formation of wait states in applications with one-sided communication

OPENALEX - Publications

Marc-André Hermanns Manfred Miklosch David Böhme Felix Wolf

To better understand the formation of wait states in MPI programs and to support user finding optimization targets case load imbalance, a major source states, we added our earlier work two new trace-analysis techniques Scalasca, performance analysis tool designed for large-scale applications. In this paper, show how techniques, which were originally restricted two-sided collective communication, are extended cover also one-sided communication. We demonstrate experiences with benchmark...

10.1145/2488551.2488569 article EN 2013-09-11

Catching Idlers with Ease

OPENALEX - Publications

Guoyong Mao David Böhme Marc-André Hermanns Markus Geimer Daniel Lorenz and 1 more

Load imbalance usually introduces wait states into the execution of parallel programs. Being able to identify and quantify is therefore essential for diagnosis remediation this phenomenon. An established method detecting generate event traces compare relevant timestamps across process boundaries. However, large trace volumes prevent analysis longer periods. In paper, we present an extremely lightweight wait-state profiler which does not rely on that can be used estimate in MPI codes with...

10.1145/2642769.2642783 article EN 2014-08-29

Coming Soon ...