NFDI4DS | UHH-SEMS - Publication Details

Michael Lang

ORCID: 0000-0002-3498-6352

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5078475557

Research Areas

Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Interconnection Networks and Systems
Cloud Computing and Resource Management
Electrical Fault Detection and Protection
Distributed systems and fault tolerance
Risk and Safety Analysis
Software-Defined Networks and 5G
Refrigeration and Air Conditioning Technologies
Occupational Health and Safety Research
Spacecraft and Cryogenic Technologies
Advanced Optical Network Technologies
Low-power high-performance VLSI design
Embedded Systems Design Techniques
Advanced Thermodynamic Systems and Engines
Advanced Combustion Engine Technologies
Vacuum and Plasma Arcs
Catalytic Processes in Materials Science
Caching and Content Delivery
Peer-to-Peer Network Technologies
Rocket and propulsion systems research
Advanced Memory and Neural Computing
Vehicle emissions and performance
Semiconductor materials and devices

Graz University of Technology
2006-2024

Combustion Institute
2006-2021

Los Alamos National Laboratory
2011-2020

Mersen (United States)
2011-2020

Lawrence Livermore National Laboratory
2019

Sandia National Laboratories California
2017

Association for Computing Machinery
2017

Gorgias Press (United States)
2014

University of California, Irvine
2012

TU Wien
2010

Entering the petaflop era: the architecture and performance of Roadrunner

OPENALEX - Publications

Kevin Barker Kei Davis Adolfy Hoisie Darren J. Kerbyson Michael Lang and 2 more

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors AMD Opteron cores in 3,060 compute nodes. the first to run Linpack at sustained speed excess of 1 Pflop/s. In this paper we present detailed architectural description performance analysis system. A case study optimizing MPI-based application Sweep3D exploit Roadrunner's hybrid architecture also included. The compared that code on...

10.5555/1413370.1413372 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-11-15

Entering the petaflop era: The architecture and performance of Roadrunner

OPENALEX - Publications

Kevin Barker Kei Davis Adolfy Hoisie Darren J. Kerbyson Michael Lang and 2 more

10.1109/sc.2008.5217926 article EN 2008-11-01

Optimizing load balancing and data-locality with data-aware scheduling

OPENALEX - Publications

Ke Wang Xraobing Zhou Tonglin Li Dongfang Zhao Michael Lang and 1 more

Load balancing techniques (e.g. work stealing) are important to obtain the best performance for distributed task scheduling systems that have multiple schedulers making decisions. In stealing, tasks randomly migrated from heavy-loaded idle ones. However, data-intensive applications where dependent and execution involves processing a large amount of data, migrating blindly yields poor data-locality incurs significant data-transferring overhead. This improves stealing by using both dedicated...

10.1109/bigdata.2014.7004220 article EN 2021 IEEE International Conference on Big Data (Big Data) 2014-10-01

Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules

OPENALEX - Publications

Onkar Patil Latchesar Ionkov Jason Lee Frank Mueller Michael Lang

Non-volatile, byte-addressable memory (NVM) has been introduced by Intel in the form of NVDIMMs named Intel® Optane™ DC PMM. This module ability to persist data stored it without need for power. expands hierarchy into a hybrid system due differences access latency and bandwidth from DRAM, which predominant main technology. The Optane modules have up 8x capacity DDR4 DRAM can expand byte-address space 6 TB per node. Many applications now scale their problem size given such system. We evaluate...

10.1145/3357526.3357541 article EN Proceedings of the International Symposium on Memory Systems 2019-09-30

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

OPENALEX - Publications

Ke Wang Ning Liu Iman Sadooghi Xi Yang Xiaobing Zhou and 4 more

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through Hadoop implementation and framework decoupling (e.g. YARN, Mesos) allowed to scale tens of thousands commodity cluster processors, centralized designs resource manager, task scheduler metadata management HDFS file system adversely affect Hadoop's scalability tomorrow's extreme-scale centers. This paper aims address YARN scaling issues a distributed execution...

10.1109/cluster.2015.42 article EN 2015-09-01

Next generation job management systems for extreme-scale ensemble computing

OPENALEX - Publications

Ke Wang Xiaobing Zhou Hao Chen Michael Lang Ioan Raicu

With the exponential growth of supercomputers in parallelism, applications are growing more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such as finer-grained many-task computing (MTC) applications. Delivering high throughput low latency for both requires developing a distributed job management system that is magnitudes scalable than today's centralized ones. In this paper, we present launch prototype, SLURM++, which comprised multiple controllers with each...

10.1145/2600212.2600703 article EN 2014-06-20

Using simulation to explore distributed key-value stores for extreme-scale system services

OPENALEX - Publications

Ke Wang Abhishek Kulkarni Michael Lang Dorian Arnold Ioan Raicu

Owing to the significant high rate of component failures at extreme scales, system services will need be failure-resistant, adaptive and self-healing. A majority HPC are still designed around a centralized paradigm hence susceptible scaling issues. Peer-to-peer have proved themselves scale for wide-area internet workloads. Distributed key-value stores (KVS) widely used as building block these services, but not prevalent in services. In this paper, we simulate KVS various service...

10.1145/2503210.2503239 article EN 2013-10-30

Using Performance Modeling to Design Large-Scale Systems

OPENALEX - Publications

Kevin Barker Kei Davis Adolfy Hoisie Darren J. Kerbyson Michael Lang and 2 more

A methodology for accurately modeling large applications explores the performance of ultrascale systems at different stages in their life cycle, from early design through production use.

10.1109/mc.2009.372 article EN Computer 2009-11-01

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing

OPENALEX - Publications

Ke Wang Xiaobing Zhou Kan Qiao Michael Lang Benjamin McClelland and 1 more

One way to efficiently utilize the coming exascale machines is support a mixture of applications in various domains, such as traditional large-scale HPC, ensemble runs, and fine-grained many-task computing (MTC). Delivering high performance resource allocation, scheduling launching for all types jobs has driven us develop Slurm++, distributed workload manager directly extended from Slurm centralized production system. Slurm++ employs multiple controllers with each one managing partition...

10.1145/2749246.2749249 article EN 2015-06-08

Architecture---A performance comparison through benchmarking and modeling of three leading supercomputers

OPENALEX - Publications

Adolfy Hoisie Greg Johnson Darren J. Kerbyson Michael Lang Scott Pakin

This work provides a performance analysis of three leading supercomputers that have recently been deployed: Purple, Red Storm and Blue Gene/L. Each these machines are architecturally diverse, with very different characteristics. contains over 10,000 processors has system peak 40 Teraflops. We analyze each using range micro-benchmarks which include communication as well quantifying the impact operating system. The achievable application is compared across systems. confirmed via use detailed...

10.1145/1188455.1188534 article EN 2006-01-01

Optimized InfiniBandTM fat‐tree routing for shift all‐to‐all communication patterns

OPENALEX - Publications

Eitan Zahavi Gregory Johnson Darren J. Kerbyson Michael Lang

Abstract Clustered systems have become a dominant architecture of scalable high‐performance super computers. In these large‐scale computers, the network performance and scalability is as critical compute‐nodes speed. InfiniBand TM has commodity networking solution supporting stringent latency, bandwidth requirements clusters. The also affected by its topology, packet routing communication patterns distributed application exercises. Fat‐trees are topology structures used for constructing most...

10.1002/cpe.1527 article EN Concurrency and Computation Practice and Experience 2009-11-17

A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING

OPENALEX - Publications

Kevin Barker Kei Davis Adolfy Hoisie Darren J. Kerbyson Michael Lang and 2 more

In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD Intel processors Barcelona Tigerton. Nehalem is the first processor implement NUMA architecture incorporating QuickPath Interconnect for interconnecting within node, incorporate integrated memory controller. We evaluate suitability these in quad-socket compute nodes as building blocks large-scale scientific computing...

10.1142/s012962640800351x article EN Parallel Processing Letters 2008-12-01

A new routing scheme for Jellyfish and its performance with HPC workloads

OPENALEX - Publications

Xin Yuan Santosh Mahapatra Wickus Nienaber Scott Pakin Michael Lang

The jellyfish topology where switches are connected using a random graph has recently been proposed for large scale data-center networks. It shown to offer higher bisection bandwidth and better permutation throughput than the corresponding fat-tree with similar cost. In this work, we propose new routing scheme that out-performs existing schemes by more effectively exploiting path diversity, comprehensively compare performance of topologies HPC workloads. results indicate both comparable high...

10.1145/2503210.2503229 article EN 2013-10-30

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

OPENALEX - Publications

Ke Wang Kan Qiao Iman Sadooghi Xiaobing Zhou Tonglin Li and 2 more

Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, proposes a fully task scheduling architecture that employs many schedulers the compute nodes make decisions. Achieving load balancing and best exploiting data locality are two important goals performance of Our previous research proposed data‐aware...

10.1002/cpe.3617 article EN Concurrency and Computation Practice and Experience 2015-08-14

Effect of electrode orientation in arc flash testing

OPENALEX - Publications

R. Wilkins M. Allison Michael Lang

The arc flash hazard calculation method proposed in IEEE 1584 is based on tests with the arcing electrodes a vertical plane and calorimeters arranged at 90/spl deg/ to this plane. In paper results of using test set-up both horizontal are given. High-speed videography incident energy measurements show that much worse orientation. However current-limiting fuses effective limiting energy, even worst-case conditions, provided bolted-fault current high enough cause them operate their mode. values...

10.1109/ias.2005.1518348 article EN Fourtieth IAS Annual Meeting. Conference Record of the 2005 Industry Applications Conference, 2005. 2005-10-24

Topology-custom UGAL routing on dragonfly

OPENALEX - Publications

Md Shafayat Rahman Saptarshi Bhowmik Yevgeniy Ryasnianskiy Xin Yuan Michael Lang

The Dragonfly network has been deployed in the current generation supercomputers and will be used next supercomputers. Universal Globally Adaptive Load-balance routing (UGAL) is state-of-the-art scheme for Dragonfly. In this work, we show that performance of conventional UGAL can further improved on many practical networks, especially ones with a small number groups, by customizing paths each topology. We develop to compute custom sets topology compare our topology-custom (T-UGAL) UGAL. Our...

10.1145/3295500.3356208 article EN 2019-11-07

Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-Room Throughput

OPENALEX - Publications

Ziming Zhang Michael Lang Scott Pakin Song Fu

Power-aware parallel job scheduling has been recognized as a demanding issue in the high-performance computing (HPC) community. The goal is to efficiently allocate and utilize power energy machine rooms. In practice for rooms well over-provisioned, specified by high LINPACK runs or nameplate estimates. This results considerable amount of trapped capacity. Instead being wasted, this capacity should be reclaimed accommodate more compute nodes room thereby increase system throughput. But do we...

10.1109/e2sc.2014.10 article EN 2014-11-01

Exploring the Design Tradeoffs for Extreme-Scale High-Performance Computing System Software

OPENALEX - Publications

Ke Wang Abhishek Kulkarni Michael Lang Dorian Arnold Ioan Raicu

Owing to the extreme parallelism and high component failure rates of tomorrow's exascale, high-performance computing (HPC) system software will need be scalable, failure-resistant, adaptive for sustained operation full utilizations. Many existing HPC are still designed around a centralized server paradigm hence susceptible scaling issues single points failure. In this article, we explore design tradeoffs scalable at scales. We propose general taxonomy by deconstructing common into their...

10.1109/tpds.2015.2430852 article EN publisher-specific-oa IEEE Transactions on Parallel and Distributed Systems 2015-05-07

Power usage of production supercomputers and production workloads

OPENALEX - Publications

Scott Pakin Curtis B. Storlie Michael Lang Robert E. Fields Eloy Romero and 7 more

Summary Power is becoming an increasingly important concern for large supercomputer centers. However, to date, there have been a dearth of studies power usage ‘in the wild’—on production supercomputers running workloads. In this paper, we present initial results project characterize three Top500 at Los Alamos National Laboratory: Cielo, Roadrunner, and Luna (#15, #19, #47, respectively, on June 2012 list). measurements taken both switchboard level within compute racks are presented...

10.1002/cpe.3191 article EN Concurrency and Computation Practice and Experience 2013-12-23

LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems

OPENALEX - Publications

Xin Yuan Santosh Mahapatra Michael Lang Scott Pakin

Traditionally, interconnect performance is either characterized by simple topological parameters such as bisection bandwidth or studied through simulation that gives detailed information for the scenarios simulated. Neither of these approaches provides a good overview extreme-scale interconnects. The are not directly related to application level communication while complexity limits number can be investigated. In this work, we propose new metric, called LANL-FSU Throughput Indices (LFTI),...

10.1109/ipdps.2014.38 article EN 2014-05-01

Effect of Insulating Barriers in Arc Flash Testing

OPENALEX - Publications

R. Wilkins Michael Lang M. Allison

Low-voltage arc flash testing has been conducted using the standard IEEE 1584 test procedure but modified so that electrode tips are terminated in an insulating barrier instead of open air. The prevents downward motion, a stabilizing effect on arcs, and produces strong horizontal plasma cloud flow. It also shorter lengths, higher arcing currents maximum incident energy density, when compared with arrangement presently used. erosion copper electrodes is much used, which causes larger quantity...

10.1109/tia.2008.2002176 article EN IEEE Transactions on Industry Applications 2008-09-01

Effect of insulating barriers in arc flash testing

OPENALEX - Publications

R. Wilkins Michael Lang M. Allison

Low voltage arc flash testing has been conducted using the standard IEEE1584 test procedure, but with electrode tips terminated in an insulating barrier instead of open air. The prevents downwards motion, a stabilizing effect on arcs, and produces strong horizontal plasma cloud flow. It also shorter lengths, higher arcing currents maximum incident energy density, when compared arrangement. Erosion copper electrodes is very high this causes much larger quantity spray to be directed towards...

10.1109/papcon.2008.4585811 article EN 2008-06-01

High Precision Large Deployable Space Reflector Based On Pillow-Effect-Free Technology

OPENALEX - Publications

L. Datashvili H. Baier Juergen Schimitschek Michael Lang Martin Huber

Existing large deployable reflectors use metal meshes as a reflecting surface (RS) and are characterized by pillow effect, which in turn reduces the precision of RS, evaluated numerically this paper. The paper focuses on results extensive numerical experimental investigations SMART (Shell Membrane Antenna Reflector Technology) for thermo-mechanical radio-frequency (RF) characterization. developed reflector concept includes pillow-effect-free RS made carbon fibre reinforced silicone composite...

10.2514/6.2007-2186 article EN 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 2007-04-23

Coming Soon ...