NFDI4DS | UHH-SEMS - Publication Details

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

Software portability GPU cluster

DOI: 10.48550/arxiv.2210.06438 Publication Date: 2022-01-01

Abstract Supplemental Material References Cited by

AUTHORS (6)

Gregor Daiß

P. Diehl

Dominic Marcello

Alireza Kheirkhahan

Hartmut Kaiser

Dirk Pflüger

ABSTRACT

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially adaptively refined ones. In Octo-Tiger, an astrophysics application the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks easily distribute work finely overlap communication computation. For computations themselves, use Kokkos turn these into compute kernels capable running on hardware ranging from few CPU cores powerful accelerators. There missing link, however: while parallelism exposed by useful scalability, it can hinder GPU when become too small saturate device, causing low resource utilization. To bridge gap, investigate multiple different aggregation strategies within adding one new strategy, evaluate node-level impact recent AMD NVIDIA GPUs, achieving noticeable speedups.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....