- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Interconnection Networks and Systems
- Scientific Computing and Data Management
- Caching and Content Delivery
- Combustion and flame dynamics
- Advanced Combustion Engine Technologies
- Embedded Systems Design Techniques
- Software System Performance and Reliability
- Defense, Military, and Policy Studies
- Military, Security, and Education Studies
- Military and Defense Studies
- Education and Military Integration
- Catalytic Processes in Materials Science
- Cloud Data Security Solutions
- International Human Rights and Reproductive Law
- Trauma, Hemostasis, Coagulopathy, Resuscitation
- Workplace Violence and Bullying
- Radiation Effects in Electronics
- Effects of Environmental Stressors on Livestock
- Sexual Assault and Victimization Studies
- Scheduling and Optimization Algorithms
Argonne National Laboratory
2016-2025
Argonne Leadership Computing Facility
2016-2024
Lawrence Berkeley National Laboratory
2021
Office of Scientific and Technical Information
2012
Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques available capturing the I/O behavior of individual application trial runs and specific components system, continuous characterization production system remains daunting challenge systems with hundreds thousands compute cores multiple petabytes storage. As result, these often designed without clear understanding diverse computational workloads they will support. In this...
Today's top high performance computing systems run applications with hundreds of thousands processes, contain storage nodes, and must meet massive I/O requirements for capacity performance. These leadership-class face daunting challenges to deploying scalable systems. In this paper we present a case study the scalability on Intrepid, IBM Blue Gene/P system at Argonne Leadership Computing Facility. Listed in 5 fastest supercomputers 2008, Intrepid runs computational science intensive demands...
We examine the I/O behavior of thousands supercomputing applications "in wild," by analyzing Darshan logs over a million jobs representing combined total six years across three leading high-performance computing platforms. mined these to analyze all their runs on platform; evolution an application's time, and platforms; platform's entire workload. Our analysis techniques can help developers platform owners improve performance system utilization, quickly identifying underperforming offering...
Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports fail-slow incidents, collected from large-scale cluster deployments in 14 institutions. show that all types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. made several important observations faults convert one form to another, the cascading root causes impacts be long, have varying symptoms. From this study, we make suggestions vendors, operators, systems designers.
Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques available capturing the I/O behavior of individual application trial runs and specific components system, continuous characterization production system remains daunting challenge systems with hundreds thousands compute cores multiple petabytes storage. As result, these often designed without clear understanding diverse computational workloads they will support.In this...
MPI is the most prominent programming model used in scientific computing today. Despite importance of MPI, however, how applications use it production not well understood. This lack understanding attributed primarily to fact that systems are often wary incorporating automatic profiling tools perform such analysis because concerns about potential performance over-heads. In this study, we a lightweight tool, called Autoperf, log usage characteristics on large IBM BG/Q supercomputing system...
The increasing complexity of HPC systems has introduced new sources variability, which can contribute to significant differences in run-to-run performance applications. With components at various levels the system contributing application developers and users are now faced with difficult task running tuning their applications an environment where measurements vary by as much a factor two three. In this study, we classify, quantify, present ways mitigate variability on Cray XC Intel Xeon Phi...
Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on number different compute platforms in their lifetime. These large-scale HPC employ increasingly complex subsystems to provide suitable level performance applications. Tuning workloads for such system is nontrivial, the results generally not portable other systems. profiling tools can help address this challenge, but most existing only instrument specific...
In recent years, half precision floating-point arithmetic has gained wide support in hardware and software stack thanks to the advance of artificial intelligence machine learning applications. Operating at can significantly reduce memory footprint comparing operating single or double precision. For bound applications such as time domain wave simulations, this is an attractive feature. However, narrower width data format lead degradation solution quality due larger roundoff errors. work, we...
I/O efficiency is essential to productivity in scientific computing, especially as many domains become more data-intensive. Many characterization tools have been used elucidate specific aspects of parallel performance, but analyzing components complex subsystems isolation fails provide insight into critical questions: how do the interact, what are reasonable expectations for application and underlying causes performance problems? To address these questions while capitalizing on existing...
MPI is the most prominent programming model used in scientific computing today. Despite importance of MPI, however, how applications use it production not well understood. This lack understanding attributed primarily to fact that systems are often wary incorporating automatic profiling tools perform such analysis because concerns about potential performance over-heads. In this study, we a lightweight tool, called Autoperf, log usage characteristics on large IBM BG/Q supercomputing system...
In this paper, we propose an approach to improving the I/O performance of IBM Blue Gene/Q supercomputing system using a novel framework that can be integrated into high applications. We take advantage system's tremendous computing resources and interconnection bandwidth among compute nodes efficiently exploit bandwidth. This focuses on lossless data compression, topology-aware movement, subfiling. The efficacy solution is demonstrated microbenchmarks application-level benchmark.
Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on number different compute platforms in their lifetime. These large-scale HPC employ increasingly complex subsystems to provide suitable level performance applications. Tuning workloads for such system is nontrivial, the results generally not portable other systems. profiling tools can help address this challenge, but most existing only instrument specific...
In preparation for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, climate community will run Coupled Model Intercomparison Project phase 5 (CMIP-5) experiments, which are designed to answer crucial questions about future regional change and results of carbon feedback different mitigation scenarios. The CMIP-5 experiments generate petabytes data that must be replicated seamlessly, reliably, quickly hundreds research teams around globe. As an end-to-end test...
Application performance variability caused by network contention is a major issue on dragonfly based systems. This work-in-progress study makes two contributions. First, we analyze real workload logs and conduct application experiments the production system Theta at Argonne to evaluate variability. We find strong correlation between utilization where high (e.g., above 95%) can cause up 21% degradation in performance. Next, driven this key finding, investigate scheduling policy mitigate...
High Performance Computing (HPC) is an important method for scientific discovery via large-scale simulation, data analysis, or artificial intelligence. Leadership-class supercomputers are expensive, but essential to run large HPC applications. The Petascale era of began in 2008, with the first machines achieving performance excess one petaflops, and advent new 2021 (e.g., Aurora, Frontier), Exascale will soon begin. However, high theoretical computing capability (i.e., peak FLOPS) a machine...
Summary In order to provide a stepping stone from the Argonne Leadership Computing Facility's (ALCF) world class production 10 petaFLOP IBM BlueGene/Q system, Mira, its next generation 200 petaFLOPS 3rd Intel Xeon Phi Aurora, ALCF worked with and Cray acquire an 8.6 2nd Phi–based system named Theta. Theta was delivered, installed, integrated, accepted on aggressive schedule in just over 3 months. We will detail how we were able successfully meet deadline as well lessons learned during process.
Growing evidence in the scientific computing community indicates that parallel file systems are not sufficient for all HPC storage workloads. This realization has motivated extensive research new system designs. The question of which design we should turn to implies there could be a single answer satisfying wide range diverse applications. We argue such generic solution does exist. Instead, custom data services designed and tailored needs specific applications on hardware. Furthermore, close...
A closed-cycle gasoline compression ignition (GCI) engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial computational fluid dynamics (CFD) code, as it scaled on up 4096 cores an IBM Blue Gene/Q (BG/Q) supercomputer. The test case has 9 × 106 cells TDC, with fixed mesh size 0.15 mm, and run configurations ranging from 128 cores. Profiling done for small duration 0.11 crank angle degrees TDC during ignition. Optimization input/output (I/O)...
High-performance computing (HPC) and distributed systems rely on a diverse collection of system soft-ware to provide application services, including file systems, schedulers, web services. Such software services must manage highly concurrent requests, interact with wide range resources, scale well in order be successful. Unfortunately, no single programming model for currently offers optimal performance productivity all these tasks. While numerous libraries, languages, language extensions...