Michael Lang

ORCID: 0000-0002-3498-6352
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Cloud Computing and Resource Management
  • Electrical Fault Detection and Protection
  • Distributed systems and fault tolerance
  • Risk and Safety Analysis
  • Software-Defined Networks and 5G
  • Refrigeration and Air Conditioning Technologies
  • Occupational Health and Safety Research
  • Spacecraft and Cryogenic Technologies
  • Advanced Optical Network Technologies
  • Low-power high-performance VLSI design
  • Embedded Systems Design Techniques
  • Advanced Thermodynamic Systems and Engines
  • Advanced Combustion Engine Technologies
  • Vacuum and Plasma Arcs
  • Catalytic Processes in Materials Science
  • Caching and Content Delivery
  • Peer-to-Peer Network Technologies
  • Rocket and propulsion systems research
  • Advanced Memory and Neural Computing
  • Vehicle emissions and performance
  • Semiconductor materials and devices

Graz University of Technology
2006-2024

Combustion Institute
2006-2021

Los Alamos National Laboratory
2011-2020

Mersen (United States)
2011-2020

Lawrence Livermore National Laboratory
2019

Sandia National Laboratories California
2017

Association for Computing Machinery
2017

Gorgias Press (United States)
2014

University of California, Irvine
2012

TU Wien
2010

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors AMD Opteron cores in 3,060 compute nodes. the first to run Linpack at sustained speed excess of 1 Pflop/s. In this paper we present detailed architectural description performance analysis system. A case study optimizing MPI-based application Sweep3D exploit Roadrunner's hybrid architecture also included. The compared that code on...

10.5555/1413370.1413372 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-11-15

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors AMD Opteron cores in 3,060 compute nodes. the first to run Linpack at sustained speed excess of 1 Pflop/s. In this paper we present detailed architectural description performance analysis system. A case study optimizing MPI-based application Sweep3D exploit Roadrunner's hybrid architecture also included. The compared that code on...

10.1109/sc.2008.5217926 article EN 2008-11-01

Load balancing techniques (e.g. work stealing) are important to obtain the best performance for distributed task scheduling systems that have multiple schedulers making decisions. In stealing, tasks randomly migrated from heavy-loaded idle ones. However, data-intensive applications where dependent and execution involves processing a large amount of data, migrating blindly yields poor data-locality incurs significant data-transferring overhead. This improves stealing by using both dedicated...

10.1109/bigdata.2014.7004220 article EN 2021 IEEE International Conference on Big Data (Big Data) 2014-10-01

Non-volatile, byte-addressable memory (NVM) has been introduced by Intel in the form of NVDIMMs named Intel® Optane™ DC PMM. This module ability to persist data stored it without need for power. expands hierarchy into a hybrid system due differences access latency and bandwidth from DRAM, which predominant main technology. The Optane modules have up 8x capacity DDR4 DRAM can expand byte-address space 6 TB per node. Many applications now scale their problem size given such system. We evaluate...

10.1145/3357526.3357541 article EN Proceedings of the International Symposium on Memory Systems 2019-09-30

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through Hadoop implementation and framework decoupling (e.g. YARN, Mesos) allowed to scale tens of thousands commodity cluster processors, centralized designs resource manager, task scheduler metadata management HDFS file system adversely affect Hadoop's scalability tomorrow's extreme-scale centers. This paper aims address YARN scaling issues a distributed execution...

10.1109/cluster.2015.42 article EN 2015-09-01

With the exponential growth of supercomputers in parallelism, applications are growing more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such as finer-grained many-task computing (MTC) applications. Delivering high throughput low latency for both requires developing a distributed job management system that is magnitudes scalable than today's centralized ones. In this paper, we present launch prototype, SLURM++, which comprised multiple controllers with each...

10.1145/2600212.2600703 article EN 2014-06-20

Owing to the significant high rate of component failures at extreme scales, system services will need be failure-resistant, adaptive and self-healing. A majority HPC are still designed around a centralized paradigm hence susceptible scaling issues. Peer-to-peer have proved themselves scale for wide-area internet workloads. Distributed key-value stores (KVS) widely used as building block these services, but not prevalent in services. In this paper, we simulate KVS various service...

10.1145/2503210.2503239 article EN 2013-10-30

A methodology for accurately modeling large applications explores the performance of ultrascale systems at different stages in their life cycle, from early design through production use.

10.1109/mc.2009.372 article EN Computer 2009-11-01

One way to efficiently utilize the coming exascale machines is support a mixture of applications in various domains, such as traditional large-scale HPC, ensemble runs, and fine-grained many-task computing (MTC). Delivering high performance resource allocation, scheduling launching for all types jobs has driven us develop Slurm++, distributed workload manager directly extended from Slurm centralized production system. Slurm++ employs multiple controllers with each one managing partition...

10.1145/2749246.2749249 article EN 2015-06-08

This work provides a performance analysis of three leading supercomputers that have recently been deployed: Purple, Red Storm and Blue Gene/L. Each these machines are architecturally diverse, with very different characteristics. contains over 10,000 processors has system peak 40 Teraflops. We analyze each using range micro-benchmarks which include communication as well quantifying the impact operating system. The achievable application is compared across systems. confirmed via use detailed...

10.1145/1188455.1188534 article EN 2006-01-01

Abstract Clustered systems have become a dominant architecture of scalable high‐performance super computers. In these large‐scale computers, the network performance and scalability is as critical compute‐nodes speed. InfiniBand TM has commodity networking solution supporting stringent latency, bandwidth requirements clusters. The also affected by its topology, packet routing communication patterns distributed application exercises. Fat‐trees are topology structures used for constructing most...

10.1002/cpe.1527 article EN Concurrency and Computation Practice and Experience 2009-11-17

In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD Intel processors Barcelona Tigerton. Nehalem is the first processor implement NUMA architecture incorporating QuickPath Interconnect for interconnecting within node, incorporate integrated memory controller. We evaluate suitability these in quad-socket compute nodes as building blocks large-scale scientific computing...

10.1142/s012962640800351x article EN Parallel Processing Letters 2008-12-01

The jellyfish topology where switches are connected using a random graph has recently been proposed for large scale data-center networks. It shown to offer higher bisection bandwidth and better permutation throughput than the corresponding fat-tree with similar cost. In this work, we propose new routing scheme that out-performs existing schemes by more effectively exploiting path diversity, comprehensively compare performance of topologies HPC workloads. results indicate both comparable high...

10.1145/2503210.2503229 article EN 2013-10-30

Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, proposes a fully task scheduling architecture that employs many schedulers the compute nodes make decisions. Achieving load balancing and best exploiting data locality are two important goals performance of Our previous research proposed data‐aware...

10.1002/cpe.3617 article EN Concurrency and Computation Practice and Experience 2015-08-14

The arc flash hazard calculation method proposed in IEEE 1584 is based on tests with the arcing electrodes a vertical plane and calorimeters arranged at 90/spl deg/ to this plane. In paper results of using test set-up both horizontal are given. High-speed videography incident energy measurements show that much worse orientation. However current-limiting fuses effective limiting energy, even worst-case conditions, provided bolted-fault current high enough cause them operate their mode. values...

10.1109/ias.2005.1518348 article EN Fourtieth IAS Annual Meeting. Conference Record of the 2005 Industry Applications Conference, 2005. 2005-10-24

The Dragonfly network has been deployed in the current generation supercomputers and will be used next supercomputers. Universal Globally Adaptive Load-balance routing (UGAL) is state-of-the-art scheme for Dragonfly. In this work, we show that performance of conventional UGAL can further improved on many practical networks, especially ones with a small number groups, by customizing paths each topology. We develop to compute custom sets topology compare our topology-custom (T-UGAL) UGAL. Our...

10.1145/3295500.3356208 article EN 2019-11-07

Power-aware parallel job scheduling has been recognized as a demanding issue in the high-performance computing (HPC) community. The goal is to efficiently allocate and utilize power energy machine rooms. In practice for rooms well over-provisioned, specified by high LINPACK runs or nameplate estimates. This results considerable amount of trapped capacity. Instead being wasted, this capacity should be reclaimed accommodate more compute nodes room thereby increase system throughput. But do we...

10.1109/e2sc.2014.10 article EN 2014-11-01

Owing to the extreme parallelism and high component failure rates of tomorrow's exascale, high-performance computing (HPC) system software will need be scalable, failure-resistant, adaptive for sustained operation full utilizations. Many existing HPC are still designed around a centralized server paradigm hence susceptible scaling issues single points failure. In this article, we explore design tradeoffs scalable at scales. We propose general taxonomy by deconstructing common into their...

10.1109/tpds.2015.2430852 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2015-05-07

Summary Power is becoming an increasingly important concern for large supercomputer centers. However, to date, there have been a dearth of studies power usage ‘in the wild’—on production supercomputers running workloads. In this paper, we present initial results project characterize three Top500 at Los Alamos National Laboratory: Cielo, Roadrunner, and Luna (#15, #19, #47, respectively, on June 2012 list). measurements taken both switchboard level within compute racks are presented...

10.1002/cpe.3191 article EN Concurrency and Computation Practice and Experience 2013-12-23

Traditionally, interconnect performance is either characterized by simple topological parameters such as bisection bandwidth or studied through simulation that gives detailed information for the scenarios simulated. Neither of these approaches provides a good overview extreme-scale interconnects. The are not directly related to application level communication while complexity limits number can be investigated. In this work, we propose new metric, called LANL-FSU Throughput Indices (LFTI),...

10.1109/ipdps.2014.38 article EN 2014-05-01

Low-voltage arc flash testing has been conducted using the standard IEEE 1584 test procedure but modified so that electrode tips are terminated in an insulating barrier instead of open air. The prevents downward motion, a stabilizing effect on arcs, and produces strong horizontal plasma cloud flow. It also shorter lengths, higher arcing currents maximum incident energy density, when compared with arrangement presently used. erosion copper electrodes is much used, which causes larger quantity...

10.1109/tia.2008.2002176 article EN IEEE Transactions on Industry Applications 2008-09-01

Low voltage arc flash testing has been conducted using the standard IEEE1584 test procedure, but with electrode tips terminated in an insulating barrier instead of open air. The prevents downwards motion, a stabilizing effect on arcs, and produces strong horizontal plasma cloud flow. It also shorter lengths, higher arcing currents maximum incident energy density, when compared arrangement. Erosion copper electrodes is very high this causes much larger quantity spray to be directed towards...

10.1109/papcon.2008.4585811 article EN 2008-06-01

Existing large deployable reflectors use metal meshes as a reflecting surface (RS) and are characterized by pillow effect, which in turn reduces the precision of RS, evaluated numerically this paper. The paper focuses on results extensive numerical experimental investigations SMART (Shell Membrane Antenna Reflector Technology) for thermo-mechanical radio-frequency (RF) characterization. developed reflector concept includes pillow-effect-free RS made carbon fibre reinforced silicone composite...

10.2514/6.2007-2186 article EN 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 2007-04-23
Coming Soon ...