NFDI4DS | UHH-SEMS - Publication Details

Understanding and Improving Computational Science Storage Access through Continuous Characterization

OPENALEX - Publications

Philip Carns Kevin Harms William Allcock Charles Bacon Samuel Lang and 2 more

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques available capturing the I/O behavior of individual application trial runs and specific components system, continuous characterization production system remains daunting challenge systems with hundreds thousands compute cores multiple petabytes storage. As result, these often designed without clear understanding diverse computational workloads they will support. In this...

10.1145/2027066.2027068 article EN ACM Transactions on Storage 2011-10-01

I/O performance challenges at leadership scale

OPENALEX - Publications

Samuel Lang Philip Carns Robert Latham Robert Ross Kevin Harms and 1 more

Today's top high performance computing systems run applications with hundreds of thousands processes, contain storage nodes, and must meet massive I/O requirements for capacity performance. These leadership-class face daunting challenges to deploying scalable systems. In this paper we present a case study the scalability on Intrepid, IBM Blue Gene/P system at Argonne Leadership Computing Facility. Listed in 5 fastest supercomputers 2008, Intrepid runs computational science intensive demands...

10.1145/1654059.1654100 article EN 2009-11-14

A Multiplatform Study of I/O Behavior on Petascale Supercomputers

OPENALEX - Publications

Huong Luu Marianne Winslett William Gropp Robert Ross Philip Carns and 4 more

We examine the I/O behavior of thousands supercomputing applications "in wild," by analyzing Darshan logs over a million jobs representing combined total six years across three leading high-performance computing platforms. mined these to analyze all their runs on platform; evolution an application's time, and platforms; platform's entire workload. Our analysis techniques can help developers platform owners improve performance system utilization, quickly identifying underperforming offering...

10.1145/2749246.2749269 article EN 2015-06-08

Fail-Slow at Scale

OPENALEX - Publications

Haryadi S. Gunawi Riza O. Suminto Russell Sears Casey Golliher Sundararaman Swaminathan and 19 more

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports fail-slow incidents, collected from large-scale cluster deployments in 14 institutions. show that all types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. made several important observations faults convert one form to another, the cascading root causes impacts be long, have varying symptoms. From this study, we make suggestions vendors, operators, systems designers.

10.1145/3242086 article EN ACM Transactions on Storage 2018-08-31

Understanding and improving computational science storage access through continuous characterization

OPENALEX - Publications

Philip Carns Kevin Harms William Allcock Charles Bacon Samuel Lang and 2 more

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques available capturing the I/O behavior of individual application trial runs and specific components system, continuous characterization production system remains daunting challenge systems with hundreds thousands compute cores multiple petabytes storage. As result, these often designed without clear understanding diverse computational workloads they will support.In this...

10.1109/msst.2011.5937212 article EN 2011-05-01

Mochi: Composing Data Services for High-Performance Computing Environments

OPENALEX - Publications

Robert Ross George Amvrosiadis Philip Carns Charles D. Cranor Matthieu Dorier and 12 more

10.1007/s11390-020-9802-0 article EN Journal of Computer Science and Technology 2020-01-01

Characterization of MPI Usage on a Production Supercomputer

OPENALEX - Publications

Sudheer Chunduri Scott Parker Pavan Balaji Kevin Harms Kalyan Kumaran

MPI is the most prominent programming model used in scientific computing today. Despite importance of MPI, however, how applications use it production not well understood. This lack understanding attributed primarily to fact that systems are often wary incorporating automatic profiling tools perform such analysis because concerns about potential performance over-heads. In this study, we a lightweight tool, called Autoperf, log usage characteristics on large IBM BG/Q supercomputing system...

10.1109/sc.2018.00033 article EN 2018-11-01

Run-to-run variability on Xeon Phi based cray XC systems

OPENALEX - Publications

Sudheer Chunduri Kevin Harms Scott Parker Vitali Morozov Samuel Oshin and 2 more

The increasing complexity of HPC systems has introduced new sources variability, which can contribute to significant differences in run-to-run performance applications. With components at various levels the system contributing application developers and users are now faced with difficult task running tuning their applications an environment where measurements vary by as much a factor two three. In this study, we classify, quantify, present ways mitigate variability on Cray XC Intel Xeon Phi...

10.1145/3126908.3126926 article EN 2017-11-08

Modular HPC I/O Characterization with Darshan

OPENALEX - Publications

Shane Snyder Philip Carns Kevin Harms Robert Ross Glenn K. Lockwood and 1 more

Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on number different compute platforms in their lifetime. These large-scale HPC employ increasingly complex subsystems to provide suitable level performance applications. Tuning workloads for such system is nontrivial, the results generally not portable other systems. profiling tools can help address this challenge, but most existing only instrument specific...

10.1109/espt.2016.006 article EN 2016-11-01

Half precision wave simulation

OPENALEX - Publications

Lili Gao Kevin Harms

In recent years, half precision floating-point arithmetic has gained wide support in hardware and software stack thanks to the advance of artificial intelligence machine learning applications. Operating at can significantly reduce memory footprint comparing operating single or double precision. For bound applications such as time domain wave simulations, this is an attractive feature. However, narrower width data format lead degradation solution quality due larger roundoff errors. work, we...

10.1190/geo2024-0266.1 article EN Geophysics 2025-02-04

UMAMI

OPENALEX - Publications

Glenn K. Lockwood Wucherl Yoo Suren Byna Nicholas J. Wright Seth W. Snyder and 3 more

I/O efficiency is essential to productivity in scientific computing, especially as many domains become more data-intensive. Many characterization tools have been used elucidate specific aspects of parallel performance, but analyzing components complex subsystems isolation fails provide insight into critical questions: how do the interact, what are reasonable expectations for application and underlying causes performance problems? To address these questions while capitalizing on existing...

10.1145/3149393.3149395 article EN 2017-11-03

Characterization of MPI usage on a production supercomputer

OPENALEX - Publications

Sudheer Chunduri Scott Parker Pavan Balaji Kevin Harms Kalyan Kumaran

MPI is the most prominent programming model used in scientific computing today. Despite importance of MPI, however, how applications use it production not well understood. This lack understanding attributed primarily to fact that systems are often wary incorporating automatic profiling tools perform such analysis because concerns about potential performance over-heads. In this study, we a lightweight tool, called Autoperf, log usage characteristics on large IBM BG/Q supercomputing system...

10.5555/3291656.3291696 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2018-11-11

Scalable Parallel I/O on a Blue Gene/Q Supercomputer Using Compression, Topology-Aware Data Aggregation, and Subfiling

OPENALEX - Publications

Huy Bui Hal Finkel Venkatram Vishwanath Salma Habib Katrin Heitmann and 3 more

In this paper, we propose an approach to improving the I/O performance of IBM Blue Gene/Q supercomputing system using a novel framework that can be integrated into high applications. We take advantage system's tremendous computing resources and interconnection bandwidth among compute nodes efficiently exploit bandwidth. This focuses on lossless data compression, topology-aware movement, subfiling. The efficacy solution is demonstrated microbenchmarks application-level benchmark.

10.1109/pdp.2014.60 article EN 2014-02-01

Modular HPC I/O characterization with Darshan

OPENALEX - Publications

Seth W. Snyder Philip Carns Kevin Harms Robert Ross Glenn K. Lockwood and 1 more

Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on number different compute platforms in their lifetime. These large-scale HPC employ increasingly complex subsystems to provide suitable level performance applications. Tuning workloads for such system is nontrivial, the results generally not portable other systems. profiling tools can help address this challenge, but most existing only instrument specific...

10.5555/3018823.3018825 article EN 2016-11-13

Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network

OPENALEX - Publications

Raj Kettimuthu Alex Sim Dan Gunter Bill Allcock Peer‐Timo Bremer and 18 more

In preparation for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, climate community will run Coupled Model Intercomparison Project phase 5 (CMIP-5) experiments, which are designed to answer crucial questions about future regional change and results of carbon feedback different mitigation scenarios. The CMIP-5 experiments generate petabytes data that must be replicated seamlessly, reliably, quickly hundreds research teams around globe. As an end-to-end test...

10.1145/1851476.1851519 article EN 2010-06-21

The Effect of System Utilization on Application Performance Variability

OPENALEX - Publications

Boyang Li Sudheer Chunduri Kevin Harms Yuping Fan Zhiling Lan

Application performance variability caused by network contention is a major issue on dragonfly based systems. This work-in-progress study makes two contributions. First, we analyze real workload logs and conduct application experiments the production system Theta at Argonne to evaluate variability. We find strong correlation between utilization where high (e.g., above 95%) can cause up 21% degradation in performance. Next, driven this key finding, investigate scheduling policy mitigate...

10.1145/3322789.3328743 article EN 2019-06-17

Characterization and identification of HPC applications at leadership computing facility

OPENALEX - Publications

Zhengchun Liu Ryan Lewis Rajkumar Kettimuthu Kevin Harms Philip Carns and 3 more

High Performance Computing (HPC) is an important method for scientific discovery via large-scale simulation, data analysis, or artificial intelligence. Leadership-class supercomputers are expensive, but essential to run large HPC applications. The Petascale era of began in 2008, with the first machines achieving performance excess one petaflops, and advent new 2021 (e.g., Aurora, Frontier), Exascale will soon begin. However, high theoretical computing capability (i.e., peak FLOPS) a machine...

10.1145/3392717.3392774 article EN 2020-06-29

Global Sensitivity Analysis of a Gasoline Compression Ignition Engine Simulation with Multiple Targets on an IBM Blue Gene/Q Supercomputer

OPENALEX - Publications

Janardhan Kodavasal Yuanjiang Pei Kevin Harms Stephen Ciatti Al Wagner and 3 more

10.4271/2016-01-0602 article EN SAE technical papers on CD-ROM/SAE technical paper series 2016-04-05

Theta: Rapid installation and acceptance of an XC40 KNL system

OPENALEX - Publications

Kevin Harms Ti Leggett Ben Allen Susan Coghlan Mark R. Fahey and 3 more

Summary In order to provide a stepping stone from the Argonne Leadership Computing Facility's (ALCF) world class production 10 petaFLOP IBM BlueGene/Q system, Mira, its next generation 200 petaFLOPS 3rd Intel Xeon Phi Aurora, ALCF worked with and Cray acquire an 8.6 2nd Phi–based system named Theta. Theta was delivered, installed, integrated, accepted on aggressive schedule in just over 3 months. We will detail how we were able successfully meet deadline as well lessons learned during process.

10.1002/cpe.4336 article EN Concurrency and Computation Practice and Experience 2017-09-26

Methodology for the Rapid Development of Scalable HPC Data Services

OPENALEX - Publications

Matthieu Dorier Brad Settlemyer Galen Shipman Jérôme Soumagne Jim Kowalkowski and 10 more

Growing evidence in the scientific computing community indicates that parallel file systems are not sufficient for all HPC storage workloads. This realization has motivated extensive research new system designs. The question of which design we should turn to implies there could be a single answer satisfying wide range diverse applications. We argue such generic solution does exist. Instead, custom data services designed and tailored needs specific applications on hardware. Furthermore, close...

10.1109/pdsw-discs.2018.00013 article EN 2018-11-01

Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of Input/Output and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

OPENALEX - Publications

Janardhan Kodavasal Kevin Harms Priyesh Srivastava Sibendu Som Shaoping Quan and 2 more

A closed-cycle gasoline compression ignition (GCI) engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial computational fluid dynamics (CFD) code, as it scaled on up 4096 cores an IBM Blue Gene/Q (BG/Q) supercomputer. The test case has 9 × 106 cells TDC, with fixed mesh size 0.15 mm, and run configurations ranging from 128 cores. Profiling done for small duration 0.11 crank angle degrees TDC during ignition. Optimization input/output (I/O)...

10.1115/1.4032623 article EN Journal of Energy Resources Technology 2016-01-29

AESOP: Expressing Concurrency in High-Performance System Software

OPENALEX - Publications

Dries Kimpe Philip Carns Kevin Harms Justin M. Wozniak Samuel Lang and 1 more

High-performance computing (HPC) and distributed systems rely on a diverse collection of system soft-ware to provide application services, including file systems, schedulers, web services. Such software services must manage highly concurrent requests, interact with wide range resources, scale well in order be successful. Unfortunately, no single programming model for currently offers optimal performance productivity all these tasks. While numerous libraries, languages, language extensions...

10.1109/nas.2012.41 article EN 2012-06-01