From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels
Software portability
GPU cluster
DOI:
10.48550/arxiv.2210.06438
Publication Date:
2022-01-01
AUTHORS (6)
ABSTRACT
Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially adaptively refined ones. In Octo-Tiger, an astrophysics application the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks easily distribute work finely overlap communication computation. For computations themselves, use Kokkos turn these into compute kernels capable running on hardware ranging from few CPU cores powerful accelerators. There missing link, however: while parallelism exposed by useful scalability, it can hinder GPU when become too small saturate device, causing low resource utilization. To bridge gap, investigate multiple different aggregation strategies within adding one new strategy, evaluate node-level impact recent AMD NVIDIA GPUs, achieving noticeable speedups.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....