- Interconnection Networks and Systems
- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Simulation Techniques and Applications
- Data Visualization and Analytics
- Software-Defined Networks and 5G
- Complex Network Analysis Techniques
- Scientific Computing and Data Management
- Distributed systems and fault tolerance
- Anomaly Detection Techniques and Applications
- Semiconductor materials and devices
- Peer-to-Peer Network Technologies
- Topological and Geometric Data Analysis
- Time Series Analysis and Forecasting
- Software System Performance and Reliability
- Radiation Effects in Electronics
- Blockchain Technology in Education and Learning
- Neuroscience and Neural Engineering
- Advanced Memory and Neural Computing
- Low-power high-performance VLSI design
- Simulation and Modeling Applications
- Video Analysis and Summarization
- Data Mining Algorithms and Applications
Sultan Ageng Tirtayasa University
2023
Sandia National Laboratories
2020
Argonne National Laboratory
2015-2020
National Research Council
2020
Institute of Electronics, Computer and Telecommunication Engineering
2020
Prince of Songkla University
2020
Webb Institute
2020
Sandia National Laboratories California
2018-2020
Rensselaer Polytechnic Institute
2012-2015
With the increasing complexity of today's high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring design space HPC systems-in particular, networks. In order to make effective decisions, simulations these systems must possess following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, (3) be able analyze broad range network workloads. Most state-of-the-art frameworks, however, are constrained one or more...
A low-latency and low-diameter interconnection network will be an important component of future exascale architectures. The dragonfly topology, a two-level directly connected network, is candidate for architectures because its low diameter reduced latency. To date, small-scale simulations with few thousand nodes have been carried out to examine the topology. However, machines millions cores up 1 million nodes. In this paper, we focus on modeling simulation large-scale networks using...
High-radix, low-diameter dragonfly networks will be a common choice in next-generation supercomputers. Preliminary studies show that random job placement with adaptive routing should the rule of thumb to utilize such networks, since it uniformly distributes traffic and alleviates congestion. Nevertheless, this work we find while coupled is good at load balancing network traffic, cannot guarantee best performance for every job. The improvement communication-intensive applications comes...
High-radix, low-diameter dragonfly networks will be a common choice in next-generation supercomputers. Preliminary studies show that random job placement with adaptive routing should the rule of thumb to utilize such networks, since it uniformly distributes traffic and alleviates congestion. Nevertheless, this work we find while coupled is good at load balancing network traffic, cannot guarantee best performance for every job. The improvement communication-intensive applications comes...
The fat-tree topology is one of the most commonly used network topologies in HPC systems. Vendors support several options that can be configured when deploying networks on production systems, such as link bandwidth, number rails, planes, and tapering. This paper showcases use simulations to compare impact these design representative applications, libraries, multi-job workloads. We present advances TraceR-CODES simulation framework enable this analysis evaluate its prediction accuracy against...
A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional links to improve path diversity and exploit locality between nodes, is potential candidate for interconnects.
Accurate analysis of HPC storage system designs is contingent on the use I/O workloads that are truly representative expected use. However, analyses generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite fact no single technique appropriate for all cases. In this work, we present design IOWA, a novel abstraction allows arbitrary consumer components obtain from range diverse input sources. Thus, researchers can choose...
HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order meet the challenges of large-scale, data-intensive scientific computing. Both these technologies been studied detail independently, but interaction between them is not well understood. I/O traffic communication from concurrently scheduled applications may interfere with each other unexpected ways, this behavior vary considerably depending on resource allocation, scheduling, routing policies. In...
The overall efficiency of an extreme-scale supercomputer largely relies on the performance its network interconnects. Several state art supercomputers use networks based increasingly popular Dragonfly topology. It is crucial to study behavior and different parallel applications running in order make optimal system configurations design choices, such as job scheduling routing strategies. However, these temporal behavior, we would need a tool analyze correlate numerous sets multivariate...
As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an demand underlying network interconnect. The Slim Fly topology, a new lowdiameter low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present high-fidelity it-level model leveraging Rensselaer Optimistic Simulation System (ROSS) Co-Design Exascale Storage (CODES)...
Dragonfly networks are being widely adopted in high-performance computing systems. On these networks, however, interference caused by resource sharing can lead to significant network congestion and performance variability. We present a comparative analysis exploring the trade-off between localizing communication balancing traffic. conduct trace-based simulations for applications with different patterns, using multiple job placement policies routing mechanisms. perform an in-depth on...
Among the low-diameter, high-radix networks beingdeployed in next-generation HPC systems, dual-rail fat-treenetworks are a promising approach. Adding additional injectionconnections (rails) to one or more network planes allows multirailfat-tree alleviate communication bottlenecks. These multi-rail necessitate new design considerations, such as routing choices, job placements, and scalability of rails. We extend our fat-tree model CODES parallelsimulation framework support...
Performance modeling of extreme-scale applications on accurate representations potential architectures is critical for designing next generation supercomputing systems because it impractical to construct prototype at scale with new network hardware in order explore designs and policies. However, these simulations often rely static application traces that can be difficult work their size lack flexibility extend or up without rerunning the original application. To address this problem, we have...
High-radix, low-diameter, hierarchical networks based on the Dragonfly topology are common picks for building next generation HPC systems. However, effective tools lacking analyzing network performance and exploring design choices such emerging at scale. In this paper, we present visual analytics methods that couple data aggregation techniques with interactive visualizations large-scale networks. We create an system these techniques. To facilitate analysis exploration of behaviors, our...
Burst buffers (BBs) are increasingly exploited in contemporary supercomputers to bridge the performance gap between compute and storage systems. The design of BBs, particularly placement these devices underlying network topology, impacts both cost. As cost other components such as memory accelerators is increasing, it becoming more important that HPC centers provision BBs tailored their workloads.This work contributes a provisioning system provide accurate, multi-tenant simulations model...
Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to computational cost applying offline algorithms vast amounts log data. Analyzing large data is challenging because rate receiving limited time comprehend make it difficult for analysts sufficiently examine without missing important changes or patterns. To support analysis, we introduce visual analytic framework comprising three modules: management, interactive...
Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ offer more path diversity than the original design, they still prone to performance variability due their hierarchical architecture and resource sharing design. Event-driven network simulators indispensable tools navigating complex system In this study, we quantitatively evaluate a variety application communication interactions on 3,456-node by using CODES toolkit. This...
With the rapid growth of machine learning applications, workloads future HPC systems are anticipated to be a mix scientific simulation, big data analytics, and applications. Simulation is great research vehicle understand performance implications co-running applications with on large-scale systems. In this paper, we present Union, workload manager that provides an automatic framework facilitate hybrid simulation in CODES. Furthermore, use along CODES, investigate various composed traditional...
MPI collective operations are a critical and frequently used part of most MPI-based large-scale scientific applications. In previous work, we have enabled the Rensselaer Optimistic Simulation System (ROSS) to predict performance point-to-point messaging on high-fidelity million-node network simulations torus dragonfly interconnects. The main contribution this work is an extension these models support communication using optimistic event scheduling capability ROSS. We demonstrate that both...
Critical to the scalability of parallel adaptive simulations are control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently avoid performance degradation when neighbors on different processors. This article presents a algorithm creating deleting copies, referred as ghost localize...
Two-tiered direct network topologies such as Dragonflies have been proposed for future post-petascale and exascale machines, since they provide a high-radix, low-diameter, fast interconnection network. Such call redesigningMPI collective communication algorithms in order to attain the best performance. Yet increasingly more applications share machine, it is not clear how these topology-aware will react interference with concurrent jobs accessing same In this paper, we study three broadcast...
Network contention between concurrently running jobs on HPC systems is a primary cause of performance variability. Optimizing job allocation and avoiding network sharing are hence crucial to alleviate the potential degradation. In order do so effectively, an understanding interference among jobs, their communication patterns, in required. this work, we choose three representative applications from DOE Design Forward Project conduct detailed simulations torus model analyze both intra-and...