NFDI4DS | UHH-SEMS - Publication Details

Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound

OPENALEX - Publications

Barry Rountree Dong H. Ahn Bronis R. de Supinski David K. Lowenthal Martin Schulz

Dynamic Voltage Frequency Scaling (DVFS) has been the tool of choice for balancing power and performance in high-performance computing (HPC). With introduction Intel's Sandy Bridge family processors, researchers now have a far more attractive option: user-specified, dynamic, hardware-enforced processor bounds. In this paper we provide first look at technology HPC environment detail both opportunities potential pitfalls using technique to control power. As part evaluation measure...

10.1109/ipdpsw.2012.116 article EN 2012-05-01

Stack Trace Analysis for Large Scale Debugging

OPENALEX - Publications

Dorian Arnold Dong H. Ahn Bronis R. de Supinski Gregory L. Lee Barton P. Miller and 1 more

We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes a few by sampling stack traces form process equivalence classes, groups exhibiting similar behavior. then use full-featured debuggers on representatives these behavior classes for root cause analysis. scalably collects over period assemble profile application's routines samples call graph prefix tree that encodes common...

10.1109/ipdps.2007.370254 article EN 2007-01-01

Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters

OPENALEX - Publications

Stephen Herbein Dong H. Ahn Don Lipari Tom Scogland Marc Stearman and 4 more

The economics of flash vs. disk storage is driving HPC centers to incorporate faster solid-state burst buffers into the hierarchy in exchange for smaller parallel file system (PFS) bandwidth. In systems with an underprovisioned PFS, avoiding I/O contention at PFS level will become crucial achieving high computational efficiency. this paper, we propose novel batch job scheduling techniques that reduce such by integrating awareness policies as EASY backfilling. We model available bandwidth...

10.1145/2907294.2907316 article EN 2016-05-31

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications

OPENALEX - Publications

Simone Atzeni Ganesh Gopalakrishnan Zvonimir Rakamarić Dong H. Ahn Ignacio Laguna and 4 more

OpenMP plays a growing role as portable programming model to harness on-node parallelism, yet, existing data race checkers for have high overheads and generate many false positives. In this paper, we propose the first checker, ARCHER, that achieves accuracy, low on large applications, portability. ARCHER incorporates scalable happens-before tracking, exploits structured parallelism via combined static dynamic analysis, modularly interfaces with runtimes. significantly outperforms TSan Intel®...

10.1109/ipdps.2016.68 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2016-05-01

Flux: Overcoming scheduling challenges for exascale workflows

OPENALEX - Publications

Dong H. Ahn Ned Bass A. Chu Jim Garlick Mark Grondona and 7 more

10.1016/j.future.2020.04.006 article EN publisher-specific-oa Future Generation Computer Systems 2020-04-09

Flux: A Next-Generation Resource Management Framework for Large HPC Centers

OPENALEX - Publications

Dong H. Ahn Jim Garlick Mark Grondona Don Lipari Becky Springmeyer and 1 more

Resource and job management software is crucial to High Performance Computing (HPC) for efficient application execution. However, current systems approaches can no longer keep up with the challenges large HPC centers are facing due ever-increasing system scales, resource workload diversity, interplays between various resources (e.g., compute clusters a global file system), complexity of constraints such as strict power budgeting. To address this gap, we propose Flux, an extensible framework...

10.1109/icppw.2014.15 article EN 2014-09-01

Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models

OPENALEX - Publications

Sam Adé Jacobs Tim Moon Kevin McLoughlin Derek Jones David Hysom and 7 more

We improved the quality and reduced time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multi-level parallel training approach strong scales all of Sierra with up 97.7% efficiency. trained a novel, character-based Wasserstein autoencoder that produces higher model on 1.613 billion compounds 23 minutes while previous state art takes day 1 million compounds. Reducing from shifts creation bottleneck computer job turnaround human...

10.1177/10943420211010930 article EN cc-by-nc The International Journal of High Performance Computing Applications 2021-05-03

Scalable temporal order analysis for large scale debugging

OPENALEX - Publications

Dong H. Ahn Bronis R. de Supinski Ignacio Laguna Gregory L. Lee Ben Liblit and 2 more

We present a scalable temporal order analysis technique that supports debugging of large scale applications by classifying MPI tasks based on their logical program execution order. Our approach combines static techniques with dynamic to determine this scalably. It uses stack trace guide selection critical points in anomalous application runs. novel ordering engine then leverages information along the application's control structure apply data flow key such as loop variables. use lightweight...

10.1145/1654059.1654104 article EN 2009-11-14

AutomaDeD: Automata-based debugging for dissimilar parallel tasks

OPENALEX - Publications

Greg Bronevetsky Ignacio Laguna Saurabh Bagchi Bronis R. de Supinski Dong H. Ahn and 1 more

Today's largest systems have over 100,000 cores, with million-core expected the next few years. This growing scale makes debugging applications that run on them a daunting challenge. Few tools perform well at this and most provide an overload of information about entire job. Developers need quickly direct to root cause problem. paper presents AutomaDeD, tool identifies which tasks large-scale application first manifest bug specific code region program execution point. AutomaDeD statistically...

10.1109/dsn.2010.5544927 article EN 2010-06-01

PRIONN

OPENALEX - Publications

Michael R. Wyatt Stephen Herbein Todd Gamblin Adam Moody Dong H. Ahn and 1 more

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes runtime because it is readily available at submission time from user scripts. User-provided runtimes are typically inaccurate users overestimate or lack understanding resource requirements. Beyond runtime, other system resources, including IO network, not but play a key role in performance. There need for automatic, general, scalable tools that provide accurate usage so that,...

10.1145/3225058.3225091 article EN 2018-08-08

Workflows Community Summit: Bringing the Scientific Workflows Community Together

OPENALEX - Publications

Rafael Ferreira da Silva Henri Casanova Kyle Chard Dan Laney Dong H. Ahn and 40 more

Scientific workflows have been used almost universally across scientific domains, and underpinned some of the most significant discoveries past several decades. Many these high computational, storage, and/or communication demands, thus must execute on a wide range large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions be managed using software infrastructure. Due popularity workflows, workflow management systems (WMSs)...

10.48550/arxiv.2103.09181 preprint EN cc-by-sa arXiv (Cornell University) 2021-01-01

Scalable Analysis Techniques for Microprocessor Performance Counter Metrics

OPENALEX - Publications

Dong H. Ahn Jeffrey S. Vetter

Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these use raw counts, ratios, visualization help users make decisions their performance. While are appropriate one process, they do not scale easily new levels demanded by contemporary computing systems. Very simply, this...

10.5555/762761.762802 article EN Conference on High Performance Computing (Supercomputing) 2002-11-16

Massively parallel loading

OPENALEX - Publications

Wolfgang Frings Dong H. Ahn Matthew LeGendre Todd Gamblin Bronis R. de Supinski and 1 more

Dynamic linking has many advantages for managing large code bases, but dynamically linked applications have not typically scaled well on high performance computing systems. Splitting a monolithic executable into dynamic shared object (DSO) files decreases compile time codes, reduces runtime memory requirements by allowing modules to be loaded and unloaded as needed, allows common DSOs among executables. However, launching an that depends causes flood of file system operations at program...

10.1145/2464996.2465020 article EN 2013-05-28

SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs

OPENALEX - Publications

Simone Atzeni Ganesh Gopalakrishnan Zvonimir Rakamarić Ignacio Laguna Gregory L. Lee and 1 more

The detection and elimination of data races in largescale OpenMP programs is critical importance. Unfortunately, today's state-of-the-art race checkers suffer from high memory overheads and/or miss races. In this paper, we present SWORD, a detector that significantly improves upon these limitations. SWORD limits the application slowdown usage by utilizing only bounded, user-adjustable buffer to collect targeted accesses. When fills up, accesses are compressed flushed file system for later...

10.1109/ipdps.2018.00094 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2018-05-01

Debugging high-performance computing applications at massive scales

OPENALEX - Publications

Ignacio Laguna Dong H. Ahn Bronis R. de Supinski Todd Gamblin Gregory L. Lee and 6 more

Dynamic analysis techniques help programmers find the root cause of bugs in large-scale parallel applications.

10.1145/2667219 article EN Communications of the ACM 2015-08-24

Accurate application progress analysis for large-scale parallel debugging

OPENALEX - Publications

Subrata Mitra Ignacio Laguna Dong H. Ahn Saurabh Bagchi Martin Schulz and 1 more

Debugging large-scale parallel applications is challenging. In most HPC applications, tasks progress in a coordinated fashion, and thus fault one task can quickly propagate to other tasks, making it difficult debug. Finding the least-progressed significantly reduce effort identify where originated. However, existing approaches for detecting them suffer low accuracy large overheads; either they use imprecise static analysis or are unable infer dependence inside loops. We present loop-aware...

10.1145/2594291.2594336 article EN 2014-05-13

Lessons learned at 208K: towards debugging millions of cores

OPENALEX - Publications

Gregory L. Lee Dong H. Ahn Dorian Arnold Bronis R. de Supinski Matthew LeGendre and 3 more

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures analysis algorithms collect process application data. In addition, at such scales, each tool itself become a large parallel - already, debugging the full Blue-Gene/L (BG/L) installation Lawrence Livermore National Laboratory requires employing 1664 daemons. To reach sizes beyond, must communication...

10.5555/1413370.1413397 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-11-15

Clock delta compression for scalable order-replay of non-deterministic parallel applications

OPENALEX - Publications

Kento Sato Dong H. Ahn Ignacio Laguna Gregory L. Lee Martin Schulz

The ability to record and replay program execution helps significantly in debugging non-deterministic MPI applications by reproducing message-receive orders. However, the large amount of data that traditional record-and-reply techniques precludes its practical applicability massively parallel applications. In this paper, we propose a new compression algorithm, Clock Delta Compression (CDC), for scalable CDC defines reference order message receives based on totally ordered relation using...

10.1145/2807591.2807642 article EN 2015-10-27

Generalizable coordination of large multiscale workflows

OPENALEX - Publications

Harsh Bhatia Francesco Di Natale Joseph Y. Moon Xiaohua Zhang Joseph R. Chavez and 13 more

The advancement of machine learning techniques and the heterogeneous architectures most current supercomputers are propelling demand for large multiscale simulations that can automatically autonomously couple diverse components map them to relevant resources solve complex problems at multiple scales. Nevertheless, despite recent progress in workflow technologies, capabilities limited coupling two In first-ever demonstration using three scales resolution, we present a scalable generalizable...

10.1145/3458817.3476210 article EN 2021-10-21

ExaWorks: Workflows for Exascale

OPENALEX - Publications

Aymen Al-Saadi Dong H. Ahn Yadu Babuji Kyle Chard James C. Corbett and 11 more

Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications accelerate scientific discovery insight. These software combinations integrations, however, are difficult achieve due challenges of coordination deployment heterogeneous components on diverse massive platforms. We present the ExaWorks project, which can address many these challenges: is leading a co-design process create workflow Software...

10.1109/works54523.2021.00012 article EN 2021-11-01

One Step Closer to Converged Computing: Achieving Scalability with Cloud-Native HPC

OPENALEX - Publications

Daniel J. Milroy Claudia Misale Giorgis Georgakoudis Tonia Elengikal Abhik Sarkar and 6 more

As High Performance Computing (HPC) workflows increase in complexity, their designers seek to enable automation and flexibility offered by cloud technologies. Container orchestration through Kubernetes enables highly desirable capabilities but does not satisfy the performance demands of HPC. tools that automate lifecycle Message Passing Interface (MPI)-based applications do scale, scheduler provide crucial scheduling capabilities. In this work, we detail our efforts port CORAL-2 benchmark...

10.1109/canopie-hpc56864.2022.00011 article EN 2022-11-01

Overcoming Scalability Challenges for Tool Daemon Launching

OPENALEX - Publications

Dong H. Ahn Dorian Arnold Bronis R. de Supinski Gregory L. Lee Barton P. Miller and 1 more

Many tools that target parallel and distributed environments must co-locate a set of daemons with the processes application. However, efficient portable deployment these on large scale systems is an unsolved problem. We overcome this gap LaunchMON, scalable, robust, portable, secure, general purpose infrastructure for launching tool daemons. Its API allows builders to identify all job, launch relevant nodes control daemon interaction. Our results show LaunchMON scales very counts...

10.1109/icpp.2008.63 article EN 2008-09-01

Probabilistic diagnosis of performance faults in large-scale parallel applications

OPENALEX - Publications

Ignacio Laguna Dong H. Ahn Bronis R. de Supinski Saurabh Bagchi Todd Gamblin

Debugging large-scale parallel applications is challenging. Most existing techniques provide mechanisms for process control but little information about the causes of failures. debuggers also scale poorly despite continued growth in supercomputer core counts. Our novel, highly scalable tool helps developers to understand and fix performance failures correctness problems at scale. probabilistically infers least progressed task MPI programs using Markov models execution history dependence...

10.1145/2370816.2370848 article EN 2012-09-19

Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference

OPENALEX - Publications

Ignacio Laguna Dong H. Ahn Bronis R. de Supinski Saurabh Bagchi Todd Gamblin

Debugging large-scale parallel applications is challenging. Most existing techniques provide little information about failure root causes. Further, most debuggers significantly slow down program execution, and run sluggishly with massively applications. This paper presents a novel technique that scalably infers the tasks in on which occurred, as well code it originated. Our combines scalable runtime analysis static to determine least-progressed task(s) identify lines at arose. We present...

10.1109/tpds.2014.2314100 article EN IEEE Transactions on Parallel and Distributed Systems 2014-04-21

Scalable Analysis Techniques for Microprocessor Performance Counter Metrics

OPENALEX - Publications

Dong H. Ahn Jeffrey S. Vetter

Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these use raw counts, ratios, visualization help users make decisions their performance. While are appropriate one process, they do not scale easily new levels demanded by contemporary computing systems. Very simply, this...

10.1109/sc.2002.10066 article EN 2002-01-01