NFDI4DS | UHH-SEMS - Publication Details

The SP2 High-Performance Switch

OPENALEX - Publications

Craig Stunkel D. G. Shea Bülent Abali M. G. Atkins Carl A Bender and 8 more

The heart of an IBM SP2™ system is the HighPerformance Switch, which a low-latency, highbandwidth switching network that binds together RISC System/6000® processors. switch incorporates unique combination topology and architectural features to scale aggregate bandwidth, enhance reliability, simplify cabling. It bidirectional multistage interconnect subsystem driven by common oscillator, delivers both data service packets over same links. Switching elements contain dynamically allocated...

10.1147/sj.342.0185 article EN IBM Systems Journal 1995-01-01

On the Feasibility of Optical Circuit Switching for High Performance Computing Systems

OPENALEX - Publications

Kevin Barker Alan F. Benner R. Hoare Adolfy Hoisie Alex K. Jones and 8 more

The interconnect plays a key role in both the cost and performance of large-scale HPC systems. future high-bandwidth electronic interconnects mushrooms due to expensive optical transceivers needed between switches. We describe potentially cheaper more power-efficient approach building high-performance interconnects. Through empirical analysis applications, we find that bulk inter-processor communication (barring collectives) is bounded degree changes very slowly or never. Thus propose...

10.1109/sc.2005.48 article EN 2005-12-22

UCX: An Open Source Framework for HPC Network APIs and Beyond

OPENALEX - Publications

Pavel Shamis Manjunath Gorentla Venkata M. Graham Lopez Matthew Baker Óscar Hernández and 15 more

This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort national laboratories, industry, academia to design implement high-performing highly-scalable stack next generation applications systems. provides ability tailor its functionality suit wide variety application domains hardware. We envision these satisfy networking needs many programming models such as Message Passing Interface...

10.1109/hoti.2015.13 article EN 2015-08-01

Algorithm-based fault tolerance on a hypercube multiprocessor

OPENALEX - Publications

P. Banerjee J.T. Rahmeh Craig Stunkel Vivek Nair Kaushik Roy and 2 more

The design of fault-tolerant hypercube multiprocessor architecture is discussed. authors propose the detection and location faulty processors concurrently with actual execution parallel applications on using a novel scheme algorithm-based error detection. System-level mechanisms have been implemented for three 16-processor Intel iPSC multiprocessor: matrix multiplication, Gaussian elimination, fast Fourier transform. Schemes other are under development. Extensive studies done coverage...

10.1109/12.57055 article EN IEEE Transactions on Computers 1990-01-01

The SP1 high-performance switch

OPENALEX - Publications

Craig Stunkel D. G. Shea Don Grice Peter Hochschild M. Tsao

The IBM scalable POWERparallel systems 9076 SP1 connects RISC System/6000 processors via a communication network called the high-performance switch. This switch-based upon Vulcan parallel processor incorporates number of unusual features to enhance reliability, diagnose faults, and simplify cabling. paper examines switch architecture implementation overviews support software. is bidirectional MIN, provides at least 4 usable redundant paths for most pairs communicating nodes.< <ETX...

10.1109/shpcc.1994.296638 article EN 2002-12-17

Architecture and implementation of Vulcan

OPENALEX - Publications

Craig Stunkel Monty Denneau B.J. Nathanson D. G. Shea Peter Hochschild and 4 more

IBM's recently announced Scalable POWERparallel family of systems is based upon the Vulcan architecture, and currently available 9076 SP1 parallel system utilizes fundamental technology. The experimental processor designed to scale many thousands microprocessor-based nodes. To support a machine this size, nodes network incorporate number unusual features aggregate bandwidth, enhance reliability, diagnose faults, simplify cabling. multistage unified data service driven by single oscillator....

10.1109/ipps.1994.288290 article EN 2002-12-17

TRAPEDS: producing traces for multicomputers via execution driven simulation

OPENALEX - Publications

Craig Stunkel W.K. Fuchs

Trace-driven simulation is an important aid in performance analysis of computer systems. Capturing address traces for these simulations a difficult problem single processors and particularly multicomputers. Even when existing trace methods can be used on multicomputers, the amount collected data typically grows with number processors, so I/O storage costs increase. A new technique presented this paper which modifies executable code to dynamically collect from user analyzes during execution...

10.1145/75108.75380 article EN 1989-04-01

The high-speed networks of the Summit and Sierra supercomputers

OPENALEX - Publications

Craig Stunkel Richard L. Graham Gilad Shainer M. Kagan Sameh Sharkawi and 2 more

Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore Sierra utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage administration, management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR ConnectX-5 adapters Switch-IB2 switches, with compute-rack packaging cooling contributions from IBM. devices support in-network computing acceleration engines such as Scalable...

10.1147/jrd.2020.2967330 article EN IBM Journal of Research and Development 2020-05-01

Address tracing for parallel machines

OPENALEX - Publications

Craig Stunkel Bob Janssens W.K. Fuchs

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory computers highlighted. Five general categories address-trace examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, instrumented program-based traces. problems unique multiprocessors examined separately.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/2.67191 article EN Computer 1991-01-01

Implementing multidestination worms in switch-based parallel systems

OPENALEX - Publications

Craig Stunkel R. Sivaram Dhabaleswar K. Panda

Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this to switch-based parallel systems is non-trivial. In paper we propose alternative switch architectures with differing buffer organizations implement multidestination worms systems. First, discuss issues related such implementation (deadlock-freedom, replication mechanisms, header encoding, routing)....

10.1145/264107.264129 article EN 1997-05-01

Efficient broadcast and multicast on multistage interconnection networks using multiport encoding

OPENALEX - Publications

R. Sivaram D.K. Panda Craig Stunkel

This paper proposes anew approach for implementing fast multicast and broadcast in unidirectional bidirectional multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN n stages, such worms use header flits each. One flit is used each stage of the network it indicates output ports to which message needs be replicated. A worm (d/sub 1/, d/sub 2/..., n/, 1/spl les/d/sub i//spl les/k) degrees replication respective stages capable covering 1//spl...

10.1109/71.730529 article EN IEEE Transactions on Parallel and Distributed Systems 1998-01-01

An evaluation of system-level fault tolerance on the Intel hypercube multiprocessor

OPENALEX - Publications

P. Banerjee J.T. Rahmeh Craig Stunkel Vivek Nair Kaushik Roy and 1 more

A discussion is presented of a fault-tolerant hypercube multiprocessor architecture which uses novel algorithm-based fault-detection approach for identifying faulty processors. The scheme involves the detection and location processors concurrently with actual execution parallel applications on hypercube. authors have implemented system-level mechanisms various 16-processor Intel iPSC multiprocessor. They report results two applications: matrix multiplication fast Fourier transform. performed...

10.1109/ftcs.1988.5344 article EN 1988-01-01

Performance Benefits of Optical Circuit Switches for Large-Scale Dragonfly Networks

OPENALEX - Publications

Cyriel Minkenberg Germán Rodríguez Bogdan Prisacari Laurent Schares Philip Heidelberger and 2 more

We propose Optical Circuit Switching for dynamically creating reconfigurable partitions in large-scale systems with Dragonfly networks. Up to 2x execution-time improvement is demonstrated global traffic patterns a >13,000-node system using production-grade network simulator.

10.1364/ofc.2016.w3j.3 article EN Optical Fiber Communication Conference 2016-01-01

Adaptive source routing in multistage interconnection networks

OPENALEX - Publications

Yucel Aydogan Craig Stunkel Cevdet Aykanat Bülent Abali

We describe the adaptive source routing (ASR) method which is a first attempt to combine and methods. In ASR, adaptivity of each packet determined at processor. Every can be routed in fully or partially non-adaptive manner, all within same network time. evaluate compare performance proposed networks oblivious by simulations. also route generation algorithm that determines maximally routes multistage networks.

10.1109/ipps.1996.508067 article EN Proceedings of the International Conference on Parallel Processing 2002-12-23

HIPIQS: a high-performance switch architecture using input queuing

OPENALEX - Publications

R. Sivaram Craig Stunkel D.K. Panda

Switch-based interconnects are used in a number of application domains, including parallel system interconnects, local area networks, and wide networks. However, very few switches have been designed that suitable for more than one these domains. Such switch must offer both extremely low latency high throughput variety different message sizes. While some architectures with output queuing shown to perform well terms throughput, their performance can suffer when systems where significant...

10.1109/71.993207 article EN IEEE Transactions on Parallel and Distributed Systems 2002-03-01

A new switch chip for IBM RS/6000 SP systems

OPENALEX - Publications

Craig Stunkel Jay Herring Bülent Abali R. Sivaram

Article Free Access Share on A new switch chip for IBM RS/6000 SP systems Authors: Craig B. Stunkel T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY NYView Profile , Jay Herring Server Division, 522 South Road, Poughkeepsie, Bulent Abali Rajeev Sivaram Authors Info & Claims SC '99: Proceedings of the 1999 ACM/IEEE conference SupercomputingJanuary 1999Pages 16–eshttps://doi.org/10.1145/331532.331548Published:01 January 1999Publication History 19citation324DownloadsMetricsTotal...

10.1145/331532.331548 article EN 1999-01-01

TRAPEDS: producing traces for multicomputers via execution driven simulation

OPENALEX - Publications

Craig Stunkel W.K. Fuchs

Trace-driven simulation is an important aid in performance analysis of computer systems. Capturing address traces for these simulations a difficult problem single processors and particularly multicomputers. Even when existing trace methods can be used on multicomputers, the amount collected data typically grows with number processors, so I/O storage costs increase. A new technique presented this paper which modifies executable code to dynamically collect from user analyzes during execution...

10.1145/75372.75380 article EN ACM SIGMETRICS Performance Evaluation Review 1989-04-01

Hypercube implementation of the simplex algorithm

OPENALEX - Publications

Craig Stunkel Daniel A. Reed

Large, sparse, linear systems of equations arise frequently when constructing mathematical models natural phenomena. Most often, these are fully constrained and can be solved via direct or iterative techniques. However, one important problem class requires solutions to underconstrained that maximize some objective function. These optimization problems formulations many business plans often contain hundreds with thousands variables. Historically, have been the simplex method. Despite...

10.1145/63047.63104 article EN 1988-01-01

Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective Operations

OPENALEX - Publications

Richard L. Graham George Bosilca Yong Qin Bradley W. Settlemyer Gilad Shainer and 6 more

With the end of Dennard scaling, specializing and distributing compute engines throughout system is a promising technique to improve applications performance. For example, NVIDIA's BlueField Data Processing Unit (DPU) integrates programmable processing elements within network offers specialized capabilities. These capabilities enable communication via offloads onto DPUs present new application opportunities for offloading nonblocking or complex patterns such as collective operations. This...

10.23919/isc.2024.10528935 article EN 2024-05-01

A reliable hardware barrier synchronization scheme

OPENALEX - Publications

R. Sivaram Craig Stunkel Dhabaleswar K. Panda

Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier through software, hardware, or combination of these mechanisms. However few emphasize fault-tolerant operations. In this paper, we describe inexpensive support that can be added network switches achieving reliable hardware-based while recovering from lost corrupted messages. Necessary modifications switch architecture and associated message-passing...

10.1109/ipps.1997.580908 article EN 2002-11-22

An evaluation of network architectures for next generation supercomputers

OPENALEX - Publications

Dong Chen Philip Heidelberger Craig Stunkel Yutaka Sugawara Cyriel Minkenberg and 2 more

We survey network topologies, in particular networks with full all-to-all bandwidth scaling. For more detailed study, we select several recently introduced, promising that are cheaper than a 3-level Fat-tree. Through combination of analysis and simulation on selected supercomputer workloads, compare these according to desirable properties such as robust performance, low cost, partitionability. conclude observations for future systems.

10.5555/3019057.3019059 article EN 2016-11-13

Where to provide support for efficient multicasting in irregular networks: network interface or switch?

OPENALEX - Publications

R. Sivaram Ram Kesavan D.K. Panda Craig Stunkel

Recent research has proposed methods for enhancing the performance of multicast in networks with irregular topologies. These fall into two broad categories: (a) network interface (NI) based schemes that make use enhanced functionality software/firmware running at NI processor; and (b) switch-based enhancements to switch architecture support hardware multicast. However it is not clear how these compare each other when makes sense one over other. In order answer such questions, we perform a...

10.1109/icpp.1998.708517 article EN 2002-11-27

Efficient broadcast and multicast on multistage interconnection networks using multiport encoding

OPENALEX - Publications

R. Sivaram D.K. Panda Craig Stunkel

This paper proposes a new approach for implementing fast multicast and broadcast in multistage interconnection networks (MINs) with multiport encoded multidestination worms. For MIN k/spl times/k switches n stages such worms use header flits each. One flit is used each stage of the network it indicates output ports to which message must be replicated. A single worm has capability cover large number destinations communication startup. switch architecture proposed without deadlock. Grouping...

10.1109/spdp.1996.570314 article EN 2002-12-23

An Evaluation of Network Architectures for Next Generation Supercomputers

OPENALEX - Publications

Dong Chen Philip Heidelberger Craig Stunkel Yutaka Sugawara Cyriel Minkenberg and 2 more

We survey network topologies, in particular networks with full all-to-all bandwidth scaling. For more detailed study, we select several recently introduced, promising that are cheaper than a 3-level Fat-tree. Through combination of analysis and simulation on selected supercomputer workloads, compare these according to desirable properties such as robust performance, low cost, partitionability. conclude observations for future systems.

10.1109/pmbs.2016.007 article EN 2016-11-01

Address tracing of parallel systems via TRAPEDS

OPENALEX - Publications

Craig Stunkel Bob Janssens W.K. Fuchs

10.1016/0141-9331(92)90067-4 article EN Microprocessors and Microsystems 1992-01-01