NFDI4DS | UHH-SEMS - Publication Details

Misbah Mubarak

ORCID: 0000-0002-9923-9825

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5004313971

Research Areas

Interconnection Networks and Systems
Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Simulation Techniques and Applications
Data Visualization and Analytics
Software-Defined Networks and 5G
Complex Network Analysis Techniques
Scientific Computing and Data Management
Distributed systems and fault tolerance
Anomaly Detection Techniques and Applications
Semiconductor materials and devices
Peer-to-Peer Network Technologies
Topological and Geometric Data Analysis
Time Series Analysis and Forecasting
Software System Performance and Reliability
Radiation Effects in Electronics
Blockchain Technology in Education and Learning
Neuroscience and Neural Engineering
Advanced Memory and Neural Computing
Low-power high-performance VLSI design
Simulation and Modeling Applications
Video Analysis and Summarization
Data Mining Algorithms and Applications

Sultan Ageng Tirtayasa University
2023

Sandia National Laboratories
2020

Argonne National Laboratory
2015-2020

National Research Council
2020

Institute of Electronics, Computer and Telecommunication Engineering
2020

Prince of Songkla University
2020

Webb Institute
2020

Sandia National Laboratories California
2018-2020

Rensselaer Polytechnic Institute
2012-2015

Enabling Parallel Simulation of Large-Scale HPC Network Systems

OPENALEX - Publications

Misbah Mubarak Christopher D. Carothers Robert Ross Philip Carns

With the increasing complexity of today's high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring design space HPC systems-in particular, networks. In order to make effective decisions, simulations these systems must possess following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, (3) be able analyze broad range network workloads. Most state-of-the-art frameworks, however, are constrained one or more...

10.1109/tpds.2016.2543725 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2016-04-07

Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation

OPENALEX - Publications

Misbah Mubarak Christopher D. Carothers Robert Ross Philip Carns

A low-latency and low-diameter interconnection network will be an important component of future exascale architectures. The dragonfly topology, a two-level directly connected network, is candidate for architectures because its low diameter reduced latency. To date, small-scale simulations with few thousand nodes have been carried out to examine the topology. However, machines millions cores up 1 million nodes. In this paper, we focus on modeling simulation large-scale networks using...

10.1109/sc.companion.2012.56 article EN 2012-11-01

Watch Out for the Bully! Job Interference Study on Dragonfly Network

OPENALEX - Publications

Yang Xu John Jenkins Misbah Mubarak Robert Ross Zhiling Lan

High-radix, low-diameter dragonfly networks will be a common choice in next-generation supercomputers. Preliminary studies show that random job placement with adaptive routing should the rule of thumb to utilize such networks, since it uniformly distributes traffic and alleviates congestion. Nevertheless, this work we find while coupled is good at load balancing network traffic, cannot guarantee best performance for every job. The improvement communication-intensive applications comes...

10.1109/sc.2016.63 article EN 2016-11-01

Watch out for the bully!: job interference study on dragonfly network

OPENALEX - Publications

Yang Xu John Jenkins Misbah Mubarak Robert Ross Zhiling Lan

10.5555/3014904.3014990 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2016-11-13

Predicting the performance impact of different fat-tree configurations

OPENALEX - Publications

Nikhil Jain Abhinav Bhatelé Louis H. Howell David Böhme Ian Karlin and 5 more

The fat-tree topology is one of the most commonly used network topologies in HPC systems. Vendors support several options that can be configured when deploying networks on production systems, such as link bandwidth, number rails, planes, and tapering. This paper showcases use simulations to compare impact these design representative applications, libraries, multi-job workloads. We present advances TraceR-CODES simulation framework enable this analysis evaluate its prediction accuracy against...

10.1145/3126908.3126967 article EN 2017-11-08

A case study in using massively parallel simulation for extreme-scale torus network codesign

OPENALEX - Publications

Misbah Mubarak Christopher D. Carothers Robert Ross Philip Carns

A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional links to improve path diversity and exploit locality between nodes, is potential candidate for interconnects.

10.1145/2601381.2601383 article EN 2014-05-18

Techniques for modeling large-scale HPC I/O workloads

OPENALEX - Publications

Seth W. Snyder Philip Carns Robert Latham Misbah Mubarak Robert Ross and 5 more

Accurate analysis of HPC storage system designs is contingent on the use I/O workloads that are truly representative expected use. However, analyses generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite fact no single technique appropriate for all cases. In this work, we present design IOWA, a novel abstraction allows arbitrary consumer components obtain from range diverse input sources. Thus, researchers can choose...

10.1145/2832087.2832091 article EN 2015-11-11

Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers

OPENALEX - Publications

Misbah Mubarak Philip Carns Jonathan Jenkins Jianping Kelvin Li Nikhil Jain and 5 more

HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order meet the challenges of large-scale, data-intensive scientific computing. Both these technologies been studied detail independently, but interaction between them is not well understood. I/O traffic communication from concurrently scheduled applications may interfere with each other unexpected ways, this behavior vary considerably depending on resource allocation, scheduling, routing policies. In...

10.1109/cluster.2017.25 article EN 2017-09-01

A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems

OPENALEX - Publications

Takanori Fujiwara Jianping Kelvin Li Misbah Mubarak Caitlin Ross Christopher D. Carothers and 2 more

The overall efficiency of an extreme-scale supercomputer largely relies on the performance its network interconnects. Several state art supercomputers use networks based increasingly popular Dragonfly topology. It is crucial to study behavior and different parallel applications running in order make optimal system configurations design choices, such as job scheduling routing strategies. However, these temporal behavior, we would need a tool analyze correlate numerous sets multivariate...

10.1016/j.visinf.2018.04.010 article EN cc-by-nc-nd Visual Informatics 2018-03-01

Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation

OPENALEX - Publications

Noah Wolfe Christopher D. Carothers Misbah Mubarak Robert Ross Philip Carns

As supercomputers close in on exascale performance, the increased number of processors and processing power translates to an demand underlying network interconnect. The Slim Fly topology, a new lowdiameter low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present high-fidelity it-level model leveraging Rensselaer Optimistic Simulation System (ROSS) Co-Design Exascale Storage (CODES)...

10.1145/2901378.2901389 article EN 2016-05-13

Trade-Off Study of Localizing Communication and Balancing Network Traffic on a Dragonfly System

OPENALEX - Publications

Xin Wang Misbah Mubarak Yang Xu Robert Ross Zhiling Lan

Dragonfly networks are being widely adopted in high-performance computing systems. On these networks, however, interference caused by resource sharing can lead to significant network congestion and performance variability. We present a comparative analysis exploring the trade-off between localizing communication balancing traffic. conduct trace-based simulations for applications with different patterns, using multiple job placement policies routing mechanisms. perform an in-depth on...

10.1109/ipdps.2018.00120 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2018-05-01

Preliminary Performance Analysis of Multi-rail Fat-Tree Networks

OPENALEX - Publications

Noah Wolfe Misbah Mubarak Nikhil Jain Jens Domke Abhinav Bhatelé and 2 more

Among the low-diameter, high-radix networks beingdeployed in next-generation HPC systems, dual-rail fat-treenetworks are a promising approach. Adding additional injectionconnections (rails) to one or more network planes allows multirailfat-tree alleviate communication bottlenecks. These multi-rail necessitate new design considerations, such as routing choices, job placements, and scalability of rails. We extend our fat-tree model CODES parallelsimulation framework support...

10.1109/ccgrid.2017.102 article EN 2017-05-01

Durango

OPENALEX - Publications

Christopher D. Carothers Jeremy S. Meredith Mark P. Blanco Jeffrey S. Vetter Misbah Mubarak and 2 more

Performance modeling of extreme-scale applications on accurate representations potential architectures is critical for designing next generation supercomputing systems because it impractical to construct prototype at scale with new network hardware in order explore designs and policies. However, these simulations often rely static application traces that can be difficult work their size lack flexibility extend or up without rerunning the original application. To address this problem, we have...

10.1145/3064911.3064923 article EN 2017-05-16

Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks

OPENALEX - Publications

Jianping Kelvin Li Misbah Mubarak Robert Ross Christopher D. Carothers Kwan‐Liu Ma

High-radix, low-diameter, hierarchical networks based on the Dragonfly topology are common picks for building next generation HPC systems. However, effective tools lacking analyzing network performance and exploring design choices such emerging at scale. In this paper, we present visual analytics methods that couple data aggregation techniques with interactive visualizations large-scale networks. We create an system these techniques. To facilitate analysis exploration of behaviors, our...

10.1109/cluster.2017.26 article EN 2017-09-01

Evaluating Burst Buffer Placement in HPC Systems

OPENALEX - Publications

Harsh Khetawat Christopher Zimmer Frank Mueller Scott Atchley Sudharshan S. Vazhkudai and 1 more

Burst buffers (BBs) are increasingly exploited in contemporary supercomputers to bridge the performance gap between compute and storage systems. The design of BBs, particularly placement these devices underlying network topology, impacts both cost. As cost other components such as memory accelerators is increasing, it becoming more important that HPC centers provision BBs tailored their workloads.This work contributes a provisioning system provide accurate, multi-tenant simulations model...

10.1109/cluster.2019.8891051 article EN 2019-09-01

A Visual Analytics Framework for Reviewing Streaming Performance Data

OPENALEX - Publications

Suraj P. Kesavan Takanori Fujiwara Jianping Kelvin Li Caitlin Ross Misbah Mubarak and 3 more

Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to computational cost applying offline algorithms vast amounts log data. Analyzing large data is challenging because rate receiving limited time comprehend make it difficult for analysts sufficiently examine without missing important changes or patterns. To support analysis, we introduce visual analytic framework comprising three modules: management, interactive...

10.1109/pacificvis48177.2020.9280 article EN 2020-05-08

Modeling and Analysis of Application Interference on Dragonfly+

OPENALEX - Publications

Yao Kang Xin Wang Neil McGlohon Misbah Mubarak Sudheer Chunduri and 1 more

Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ offer more path diversity than the original design, they still prone to performance variability due their hierarchical architecture and resource sharing design. Event-driven network simulators indispensable tools navigating complex system In this study, we quantitatively evaluate a variety application communication interactions on 3,456-node by using CODES toolkit. This...

10.1145/3316480.3325517 article EN 2019-05-29

Union: An Automatic Workload Manager for Accelerating Network Simulation

OPENALEX - Publications

Xin Wang Misbah Mubarak Yao Kang Robert Ross Zhiling Lan

With the rapid growth of machine learning applications, workloads future HPC systems are anticipated to be a mix scientific simulation, big data analytics, and applications. Simulation is great research vehicle understand performance implications co-running applications with on large-scale systems. In this paper, we present Union, workload manager that provides an automatic framework facilitate hybrid simulation in CODES. Furthermore, use along CODES, investigate various composed traditional...

10.1109/ipdps47924.2020.00089 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2020-05-01

Using massively parallel simulation for MPI collective communication modeling in extreme-scale networks

OPENALEX - Publications

Misbah Mubarak Christopher D. Carothers Robert Ross Philip Carns

MPI collective operations are a critical and frequently used part of most MPI-based large-scale scientific applications. In previous work, we have enabled the Rensselaer Optimistic Simulation System (ROSS) to predict performance point-to-point messaging on high-fidelity million-node network simulations torus dragonfly interconnects. The main contribution this work is an extension these models support communication using optimistic event scheduling capability ROSS. We demonstrate that both...

10.5555/2693848.2694239 article EN Winter Simulation Conference 2014-12-07

A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

OPENALEX - Publications

Misbah Mubarak Seegyoung Seol Qiukai Lu Mark S. Shephard

Critical to the scalability of parallel adaptive simulations are control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently avoid performance degradation when neighbors on different processors. This article presents a algorithm creating deleting copies, referred as ghost localize...

10.1155/2013/654971 article EN cc-by Scientific Programming 2013-01-01

Evaluation of Topology-Aware Broadcast Algorithms for Dragonfly Networks

OPENALEX - Publications

Matthieu Dorier Misbah Mubarak Rob Ross Jianping Kelvin Li Christopher D. Carothers and 1 more

Two-tiered direct network topologies such as Dragonflies have been proposed for future post-petascale and exascale machines, since they provide a high-radix, low-diameter, fast interconnection network. Such call redesigningMPI collective communication algorithms in order to attain the best performance. Yet increasingly more applications share machine, it is not clear how these topology-aware will react interference with concurrent jobs accessing same In this paper, we study three broadcast...

10.1109/cluster.2016.26 article EN 2016-09-01

Study of Intra- and Interjob Interference on Torus Networks

OPENALEX - Publications

Yang Xu John Jenkins Misbah Mubarak Xin Wang Robert Ross and 1 more

Network contention between concurrently running jobs on HPC systems is a primary cause of performance variability. Optimizing job allocation and avoiding network sharing are hence crucial to alleviate the potential degradation. In order do so effectively, an understanding interference among jobs, their communication patterns, in required. this work, we choose three representative applications from DOE Design Forward Project conduct detailed simulations torus model analyze both intra-and...

10.1109/icpads.2016.0040 article EN 2016-12-01

Coming Soon ...