Gregor Daiß

ORCID: 0000-0002-0989-5985
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Gamma-ray bursts and supernovae
  • Caching and Content Delivery
  • Astronomy and Astrophysical Research
  • Astro and Planetary Science
  • Advanced Image and Video Retrieval Techniques
  • Advanced Clustering Algorithms Research
  • Particle accelerators and beam dynamics
  • Cloud Computing and Resource Management
  • Particle Detector Development and Performance
  • Quantum Computing Algorithms and Architecture
  • Solar and Space Plasma Dynamics
  • Computational Physics and Python Applications
  • Pulsars and Gravitational Waves Research
  • Algorithms and Data Compression
  • Stellar, planetary, and galactic studies
  • Real-time simulation and control systems
  • Complex Network Analysis Techniques

Louisiana State University
2022-2024

University of Stuttgart
2018-2024

University of Oregon
2022

Universitat Politècnica de Catalunya
2022

Barcelona Supercomputing Center
2022

Fujitsu (Japan)
2022

Pflüger (Germany)
2022

Ranging from NVIDIA GPUs to AMD and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos its various execution spaces portable compute kernels. turn, use HPX coordinate kernel launches, CPU tasks, communication. This combination allows us have fine interleaving between CPU/GPU computations communication, enabling scalability supercomputers. However, work...

10.1145/3585341.3585354 article EN International Workshop on OpenCL 2023-04-06

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...

10.1109/ipdpsw59300.2023.00116 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2023-05-01

We study the simulation of stellar mergers, which requires complex simulations with high computational demands. have developed Octo-Tiger, a finite volume grid-based hydrodynamics code Adaptive Mesh Refinement is unique in conserving both linear and angular momentum to machine precision. To face challenge increasingly complex, diverse, heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. use HPX its futurization capabilities ensure scalability between nodes...

10.1145/3295500.3356221 preprint EN 2019-11-07

Between a widening range of GPU vendors and the trend having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier Aurora, developing performant yet portable distributed HPC applications becomes ever challenging. Leveraging existing solutions like Kokkos for platform-independent code HPX distributing application task-based fashion can alleviate these challenges. However, using frameworks same requires them to work together seamlessly. In this we present an...

10.1109/ipdpsw52791.2021.00066 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021-06-01

The optimization of performance complex simulation codes with high computational demands, such as Octo-Tiger, is an ongoing challenge. Octo-Tiger astrophysics code simulating the evolution star systems based on fast multipole method using adaptive octrees central data structure. was implemented high-level C++ libraries, specifically HPX and Vc, which allows its use different hardware platforms. Recently, we have demonstrated excellent scalability in a distributed setting.

10.1145/3204919.3204938 article EN 2018-05-02

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...

10.48550/arxiv.2304.11002 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Benchmarking and comparing performance of a scientific simulation across hardware platforms is complex task. When the in question constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, task becomes even more complex. In this paper, we discuss use uniquely suited measurement library, APEX, capture behavior built on HPX, highly scalable, distributed AMT runtime. We examine astrophysics carried-out by Octo-Tiger two different supercomputing architectures. analyze...

10.48550/arxiv.2210.06437 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...

10.3390/a12030060 article EN cc-by Algorithms 2019-03-07

In the age of data collection, machine learning algorithms have to be able efficiently cope with vast sets. This requires scalable and efficient implementations that can heterogeneous hardware. We propose a new, performance-portable implementation well-known, robust, versatile multi-class classification method supports multiple Graphics Processing Units (GPUs) from different vendors. It is based on approximate k-nearest neighbors (k-NN) algorithm in SYCL. The k-NN assigns class point...

10.1145/3456669.3456692 article EN International Workshop on OpenCL 2021-04-27

We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code Octo-Tiger. follow orbital evolution DWD systems mass ratio q=0.7 for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, Helium WD donor star is tidally disrupted within 20-80 minutes since beginning simulation onto accretor...

10.48550/arxiv.2404.06864 preprint EN arXiv (Cornell University) 2024-04-10

Cloud computing for high performance resources is an emerging topic. This service of interest to researchers who care about reproducible computing, software packages with complex installations, and companies or need the compute only occasionally do not want run maintain a supercomputer on their own. The connection between HPC containers exemplified by fact that Microsoft Azure's Eagle cloud machine number three November 23 Top 500 list. For services, application dependencies are installed in...

10.1007/978-3-031-61763-8_17 preprint EN arXiv (Cornell University) 2024-02-11

In recent years, interest in RISC-V computing architectures have moved from academic to mainstream, especially the field of High Performance Computing where energy limitations are increasingly a point concern. The results presented this paper part longer-term evaluation RISC-V's viability for HPC applications. work, we use Octo-Tiger multi-physics, multi-scale, 3D adaptive mesh refinement astrophysics application as bases our analysis. We report on experience porting modern C++ code (which...

10.48550/arxiv.2407.00026 preprint EN arXiv (Cornell University) 2024-05-10

ABSTRACT We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code octo-tiger. follow orbital evolution DWD systems mass ratio $q=0.7$ for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, helium WD donor star is tidally disrupted within 20–80 min since beginning simulation onto...

10.1093/mnras/stae2343 article EN cc-by Monthly Notices of the Royal Astronomical Society 2024-10-23

Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution localized areas across expansive domains. Today's supercomputers' extreme heterogeneity presents a significant challenge for dynamically codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger,...

10.48550/arxiv.2412.15518 preprint EN arXiv (Cornell University) 2024-12-19

In recent years, computers based on the RISC-V architecture have raised broad interest in high-performance computing (HPC) community. As community develops core instruction set (ISA) along with ISA extensions, HPC has been actively ensuring applications and environments are supported. this context, assessing performance of asynchronous many-task runtime systems (AMT) is essential. paper, we describe our experience porting a full 3D adaptive mesh-refinement, multi-scale, multi-model,...

10.1145/3624062.3624230 preprint EN cc-by-nc-sa 2023-11-10

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...

10.20944/preprints201902.0019.v1 preprint EN 2019-02-02

Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses combination HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability broad range heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with vectorization. Therefore, add std::experimental::simd as an option use in Octo-Tiger's kernels alongside SIMD, further new SVE (Scalable Vector Extensions) backend. Additionally, amend...

10.48550/arxiv.2210.06439 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially adaptively refined ones. In Octo-Tiger, an astrophysics application the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks easily distribute work finely overlap communication computation. For computations themselves, use Kokkos turn these into compute kernels capable running on hardware ranging from few CPU cores...

10.48550/arxiv.2210.06438 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...