David Böhme

ORCID: 0000-0002-4159-1519
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed systems and fault tolerance
  • Gene Regulatory Network Analysis
  • Advanced Data Storage Technologies
  • Software System Performance and Reliability
  • Interconnection Networks and Systems
  • Evolutionary Algorithms and Applications
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Cellular Automata and Applications
  • CCD and CMOS Imaging Sensors
  • Welding Techniques and Residual Stresses
  • Remote Sensing and LiDAR Applications
  • Atmospheric aerosols and clouds
  • Advanced Software Engineering Methodologies
  • Remote Sensing in Agriculture
  • Electromagnetic Scattering and Analysis
  • Advanced Welding Techniques Analysis
  • Real-Time Systems Scheduling
  • African history and culture analysis
  • Context-Aware Activity Recognition Systems
  • Neural Networks and Applications
  • Music Technology and Sound Studies
  • Software Engineering Research
  • Quality and Management Systems

Lawrence Livermore National Laboratory
2015-2020

Forschungszentrum Jülich
2009-2018

German Research School for Simulation Sciences
2012-2014

RWTH Aachen University
2010-2013

Institute for Advanced Study
2010

University of Potsdam
2008

Welding Institute (Slovenia)
1998

The critical path, which describes the longest execution sequence without wait states in a parallel program, identifies activities that determine overall program runtime. Combining knowledge of path with traditional profiles, we have defined set compact performance indicators help answer variety important performance-analysis questions, such as identifying load imbalance, quantifying impact imbalance on runtime, and characterizing resource consumption. By replaying event traces parallel, can...

10.1109/ipdps.2012.120 article EN 2012-05-01

Driven by growing application requirements and accelerated current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes taking advantage available parallelism, as delays single processes may spread wait states across entire machine. Moreover, when employing complex point-to-point patterns, propagate along far-reaching cause-effect chains that are hard...

10.1109/icpp.2010.18 article EN 2010-09-01

The fat-tree topology is one of the most commonly used network topologies in HPC systems. Vendors support several options that can be configured when deploying networks on production systems, such as link bandwidth, number rails, planes, and tapering. This paper showcases use simulations to compare impact these design representative applications, libraries, multi-job workloads. We present advances TraceR-CODES simulation framework enable this analysis evaluate its prediction accuracy against...

10.1145/3126908.3126967 article EN 2017-11-08

Driven by growing application requirements and accelerated current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes taking advantage available parallelism, as delays single processes may spread wait states across entire machine. Moreover, when employing complex point-to-point patterns, propagate along far-reaching cause-effect chains that are hard...

10.1145/2934661 article EN ACM Transactions on Parallel Computing 2016-07-20

Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer relying on applications being able exploit very large configurations of processor cores, associated analysis tools must also scale commensurately isolate quantify performance issues that manifest at the largest scales. In studying scalability Scalasca toolset several hundred thousand MPI processes XT5 BG/P systems, we investigated a progressive execution deterioration well-known ASCI...

10.1142/s0129626410000314 article EN Parallel Processing Letters 2010-12-01

Load or communication imbalance prevents many codes from taking advantage of the parallelism available on modern supercomputers. We present two scalable methods to highlight in parallel programs: The first method identifies delays that inflict wait states at subsequent synchronization points, and attributes their costs terms resource waste original cause. second combines knowledge critical path with traditional profiles derive a set compact performance indicators help answer variety...

10.1109/ipdpsw.2012.321 article EN 2012-05-01

Asynchrony and non-determinism in Charm++ programs present a significant challenge analyzing their event traces. We new framework to organize traces of parallel written Charm++. Our reorganization allows one more easily explore analyze such by providing context through logical structure. describe several heuristics compensate for missing dependencies between events that currently cannot be recorded. introduce task ordering recovers structure from the non-deterministic execution order. Using...

10.1145/2807591.2807634 article EN 2015-10-27

In studying the scalability of Scalasca performance analysis toolset to several hundred thousand MPI processes on IBM Blue Gene/P, we investigated a progressive execution deterioration well-known ASCI Sweep3D compact application. runtime summarization quantified communication time that correlated wth computational imbalance, and automated trace confirmed growing amounts waiting times. Further instrumentation, measurement analyses pinpointed conditional section highly imbalanced computation...

10.1109/ipdpsw.2010.5470816 article EN 2010-04-01

To better understand the formation of wait states in MPI programs and to support user finding optimization targets case load imbalance, a major source states, we added our earlier work two new trace-analysis techniques Scalasca, performance analysis tool designed for large-scale applications. In this paper, show how techniques, which were originally restricted two-sided collective communication, are extended cover also one-sided communication. We demonstrate experiences with benchmark...

10.1145/2488551.2488569 article EN 2013-09-11

Load imbalance usually introduces wait states into the execution of parallel programs. Being able to identify and quantify is therefore essential for diagnosis remediation this phenomenon. An established method detecting generate event traces compare relevant timestamps across process boundaries. However, large trace volumes prevent analysis longer periods. In paper, we present an extremely lightweight wait-state profiler which does not rely on that can be used estimate in MPI codes with...

10.1145/2642769.2642783 article EN 2014-08-29
Coming Soon ...