NFDI4DS | UHH-SEMS - Publication Details

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

OPENALEX - Publications

Gregor Daiß P. Diehl Hartmut Kaiser Dirk Pflüger

Ranging from NVIDIA GPUs to AMD and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos its various execution spaces portable compute kernels. turn, use HPX coordinate kernel launches, CPU tasks, communication. This combination allows us have fine interleaving between CPU/GPU computations communication, enabling scalability supercomputers. However, work...

10.1145/3585341.3585354 article EN International Workshop on OpenCL 2023-04-06

Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

OPENALEX - Publications

P. Diehl Gregor Daiß Kevin Huck Dominic Marcello Sagiv Shiber and 2 more

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...

10.1109/ipdpsw59300.2023.00116 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2023-05-01

From piz daint to the stars

OPENALEX - Publications

Gregor Daiß Parsa Amini John Biddiscombe P. Diehl Juhan Frank and 5 more

We study the simulation of stellar mergers, which requires complex simulations with high computational demands. have developed Octo-Tiger, a finite volume grid-based hydrodynamics code Adaptive Mesh Refinement is unique in conserving both linear and angular momentum to machine precision. To face challenge increasingly complex, diverse, heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. use HPX its futurization capabilities ensure scalability between nodes...

10.1145/3295500.3356221 preprint EN 2019-11-07

Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX

OPENALEX - Publications

Gregor Daiß Mikael Simberg Auriane Reverdell John Biddiscombe Theresa Pollinger and 2 more

Between a widening range of GPU vendors and the trend having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier Aurora, developing performant yet portable distributed HPC applications becomes ever challenging. Leveraging existing solutions like Kokkos for platform-independent code HPX distributing application task-based fashion can alleviate these challenges. However, using frameworks same requires them to work together seamlessly. In this we present an...

10.1109/ipdpsw52791.2021.00066 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021-06-01

Accelerating Octo-Tiger

OPENALEX - Publications

David Pfander Gregor Daiß Dominic Marcello Hartmut Kaiser Dirk Pflüger

The optimization of performance complex simulation codes with high computational demands, such as Octo-Tiger, is an ongoing challenge. Octo-Tiger astrophysics code simulating the evolution star systems based on fast multipole method using adaptive octrees central data structure. was implemented high-level C++ libraries, specifically HPX and Vc, which allows its use different hardware platforms. Recently, we have demonstrated excellent scalability in a distributed setting.

10.1145/3204919.3204938 article EN 2018-05-02

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

OPENALEX - Publications

P. Diehl Gregor Daiß Kevin Huck Dominic Marcello Sagiv Shiber and 2 more

10.1007/s11227-024-06113-w article EN The Journal of Supercomputing 2024-04-18

Preparing for HPC on RISC-V: Examining Vectorization and Distributed Performance of an Astrophysics Application with HPX and Kokkos

OPENALEX - Publications

P. Diehl Panagiotis Syskakis Gregor Daiß Steven R. Brandt Alireza Kheirkhahan and 5 more

10.1109/scw63240.2024.00207 article EN 2024-11-17

Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

OPENALEX - Publications

P. Diehl Gregor Daiß Kevin Huck Dominic Marcello Sagiv Shiber and 2 more

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...

10.48550/arxiv.2304.11002 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Distributed, combined CPU and GPU profiling within HPX using APEX

OPENALEX - Publications

P. Diehl Gregor Daiß Kevin Huck Dominic Marcello Sagiv Shiber and 4 more

Benchmarking and comparing performance of a scientific simulation across hardware platforms is complex task. When the in question constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, task becomes even more complex. In this paper, we discuss use uniquely suited measurement library, APEX, capture behavior built on HPX, highly scalable, distributed AMT runtime. We examine astrophysics carried-out by Octo-Tiger two different supercomputing architectures. analyze...

10.48550/arxiv.2210.06437 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Heterogeneous Distributed Big Data Clustering on Sparse Grids

OPENALEX - Publications

David Pfander Gregor Daiß Dirk Pflüger

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...

10.3390/a12030060 article EN cc-by Algorithms 2019-03-07

Performance-Portable Distributed k-Nearest Neighbors using Locality-Sensitive Hashing and SYCL

OPENALEX - Publications

Marcel Breyer Gregor Daiß Dirk Pflüger

In the age of data collection, machine learning algorithms have to be able efficiently cope with vast sets. This requires scalable and efficient implementations that can heterogeneous hardware. We propose a new, performance-portable implementation well-known, robust, versatile multi-class classification method supports multiple Graphics Processing Units (GPUs) from different vendors. It is based on approximate k-nearest neighbors (k-NN) algorithm in SYCL. The k-NN assigns class point...

10.1145/3456669.3456692 article EN International Workshop on OpenCL 2021-04-27

Hydrodynamic simulations of WD-WD mergers and the origin of RCB stars

OPENALEX - Publications

Sagiv Shiber Orsola De Marco Patrick M. Motl Bradley Munson Dominic Marcello and 8 more

We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code Octo-Tiger. follow orbital evolution DWD systems mass ratio q=0.7 for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, Helium WD donor star is tidally disrupted within 20-80 minutes since beginning simulation onto accretor...

10.48550/arxiv.2404.06864 preprint EN arXiv (Cornell University) 2024-04-10

HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos using an astrophysics application

OPENALEX - Publications

P. Diehl Steven R. Brandt Gregor Daiß Hartmut Kaiser

Cloud computing for high performance resources is an emerging topic. This service of interest to researchers who care about reproducible computing, software packages with complex installations, and companies or need the compute only occasionally do not want run maintain a supercomputer on their own. The connection between HPC containers exemplified by fact that Microsoft Azure's Eagle cloud machine number three November 23 Top 500 list. For services, application dependencies are installed in...

10.1007/978-3-031-61763-8_17 preprint EN arXiv (Cornell University) 2024-02-11

Distributed astrophysics simulations using Octo-Tiger with RISC-V CPUs using HPX and Kokkos

OPENALEX - Publications

P. Diehl Gregor Daiß Steven R. Brandt Alireza Kheirkhahan Srinivas Yadav Singanaboina and 4 more

In recent years, interest in RISC-V computing architectures have moved from academic to mainstream, especially the field of High Performance Computing where energy limitations are increasingly a point concern. The results presented this paper part longer-term evaluation RISC-V's viability for HPC applications. work, we use Octo-Tiger multi-physics, multi-scale, 3D adaptive mesh refinement astrophysics application as bases our analysis. We report on experience porting modern C++ code (which...

10.48550/arxiv.2407.00026 preprint EN arXiv (Cornell University) 2024-05-10

Hydrodynamic simulations of white dwarf-white dwarf mergers and the origin of R Coronae Borealis stars

OPENALEX - Publications

Sagiv Shiber Orsola De Marco Patrick M. Motl Bradley Munson Dominic Marcello and 8 more

ABSTRACT We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code octo-tiger. follow orbital evolution DWD systems mass ratio $q=0.7$ for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, helium WD donor star is tidally disrupted within 20–80 min since beginning simulation onto...

10.1093/mnras/stae2343 article EN cc-by Monthly Notices of the Royal Astronomical Society 2024-10-23

Asynchronous-Many-Task Systems: Challenges and Opportunities -- Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

OPENALEX - Publications

Gregor Daiß P. Diehl Jiakun Yan John Holmen Rahulkumar Gayatri and 7 more

Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution localized areas across expansive domains. Today's supercomputers' extreme heterogeneity presents a significant challenge for dynamically codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger,...

10.48550/arxiv.2412.15518 preprint EN arXiv (Cornell University) 2024-12-19

Evaluating HPX and Kokkos on RISC-V using an astrophysics application Octo-Tiger

OPENALEX - Publications

P. Diehl Gregor Daiß Steven R. Brandt Alireza Kheirkhahan Hartmut Kaiser and 2 more

In recent years, computers based on the RISC-V architecture have raised broad interest in high-performance computing (HPC) community. As community develops core instruction set (ISA) along with ISA extensions, HPC has been actively ensuring applications and environments are supported. this context, assessing performance of asynchronous many-task runtime systems (AMT) is essential. paper, we describe our experience porting a full 3D adaptive mesh-refinement, multi-scale, multi-model,...

10.1145/3624062.3624230 preprint EN cc-by-nc-sa 2023-11-10

Heterogeneous Distributed Big Data Clustering on Sparse Grids

OPENALEX - Publications

David Pfander Gregor Daiß Dirk Pflüger

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...

10.20944/preprints201902.0019.v1 preprint EN 2019-02-02

From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types

OPENALEX - Publications

Gregor Daiß Srinivas Yadav Singanaboina P. Diehl Hartmut Kaiser Dirk Pflüger

Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses combination HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability broad range heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with vectorization. Therefore, add std::experimental::simd as an option use in Octo-Tiger's kernels alongside SIMD, further new SVE (Scalable Vector Extensions) backend. Additionally, amend...

10.48550/arxiv.2210.06439 preprint EN other-oa arXiv (Cornell University) 2022-01-01

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

OPENALEX - Publications

Gregor Daiß P. Diehl Dominic Marcello Alireza Kheirkhahan Hartmut Kaiser and 1 more

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially adaptively refined ones. In Octo-Tiger, an astrophysics application the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks easily distribute work finely overlap communication computation. For computations themselves, use Kokkos turn these into compute kernels capable running on hardware ranging from few CPU cores...

10.48550/arxiv.2210.06438 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Table of Contents

OPENALEX - Publications

Valentin Le Fèvre Gregor Daiß Srinivas Yadav P. Diehl

10.1109/espm256814.2022.00003 article EN 2022-11-01