- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Gamma-ray bursts and supernovae
- Caching and Content Delivery
- Astronomy and Astrophysical Research
- Astro and Planetary Science
- Advanced Image and Video Retrieval Techniques
- Advanced Clustering Algorithms Research
- Particle accelerators and beam dynamics
- Cloud Computing and Resource Management
- Particle Detector Development and Performance
- Quantum Computing Algorithms and Architecture
- Solar and Space Plasma Dynamics
- Computational Physics and Python Applications
- Pulsars and Gravitational Waves Research
- Algorithms and Data Compression
- Stellar, planetary, and galactic studies
- Real-time simulation and control systems
- Complex Network Analysis Techniques
Louisiana State University
2022-2024
University of Stuttgart
2018-2024
University of Oregon
2022
Universitat Politècnica de Catalunya
2022
Barcelona Supercomputing Center
2022
Fujitsu (Japan)
2022
Pflüger (Germany)
2022
Ranging from NVIDIA GPUs to AMD and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos its various execution spaces portable compute kernels. turn, use HPX coordinate kernel launches, CPU tasks, communication. This combination allows us have fine interleaving between CPU/GPU computations communication, enabling scalability supercomputers. However, work...
The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...
We study the simulation of stellar mergers, which requires complex simulations with high computational demands. have developed Octo-Tiger, a finite volume grid-based hydrodynamics code Adaptive Mesh Refinement is unique in conserving both linear and angular momentum to machine precision. To face challenge increasingly complex, diverse, heterogeneous HPC systems, Octo-Tiger relies on high-level programming abstractions. use HPX its futurization capabilities ensure scalability between nodes...
Between a widening range of GPU vendors and the trend having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier Aurora, developing performant yet portable distributed HPC applications becomes ever challenging. Leveraging existing solutions like Kokkos for platform-independent code HPX distributing application task-based fashion can alleviate these challenges. However, using frameworks same requires them to work together seamlessly. In this we present an...
The optimization of performance complex simulation codes with high computational demands, such as Octo-Tiger, is an ongoing challenge. Octo-Tiger astrophysics code simulating the evolution star systems based on fast multipole method using adaptive octrees central data structure. was implemented high-level C++ libraries, specifically HPX and Vc, which allows its use different hardware platforms. Recently, we have demonstrated excellent scalability in a distributed setting.
The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set interesting challenges to application developers. In addition requiring code portability across different parallelization schemes, programs targeting these architectures have be highly adaptable terms compute kernel sizes accommodate execution characteristics for various heterogeneous workloads. this paper, we demonstrate an approach and performance that...
Benchmarking and comparing performance of a scientific simulation across hardware platforms is complex task. When the in question constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, task becomes even more complex. In this paper, we discuss use uniquely suited measurement library, APEX, capture behavior built on HPX, highly scalable, distributed AMT runtime. We examine astrophysics carried-out by Octo-Tiger two different supercomputing architectures. analyze...
Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...
In the age of data collection, machine learning algorithms have to be able efficiently cope with vast sets. This requires scalable and efficient implementations that can heterogeneous hardware. We propose a new, performance-portable implementation well-known, robust, versatile multi-class classification method supports multiple Graphics Processing Units (GPUs) from different vendors. It is based on approximate k-nearest neighbors (k-NN) algorithm in SYCL. The k-NN assigns class point...
We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code Octo-Tiger. follow orbital evolution DWD systems mass ratio q=0.7 for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, Helium WD donor star is tidally disrupted within 20-80 minutes since beginning simulation onto accretor...
Cloud computing for high performance resources is an emerging topic. This service of interest to researchers who care about reproducible computing, software packages with complex installations, and companies or need the compute only occasionally do not want run maintain a supercomputer on their own. The connection between HPC containers exemplified by fact that Microsoft Azure's Eagle cloud machine number three November 23 Top 500 list. For services, application dependencies are installed in...
In recent years, interest in RISC-V computing architectures have moved from academic to mainstream, especially the field of High Performance Computing where energy limitations are increasingly a point concern. The results presented this paper part longer-term evaluation RISC-V's viability for HPC applications. work, we use Octo-Tiger multi-physics, multi-scale, 3D adaptive mesh refinement astrophysics application as bases our analysis. We report on experience porting modern C++ code (which...
ABSTRACT We study the properties of double white dwarf (DWD) mergers by performing hydrodynamic simulations using new and improved adaptive mesh refinement code octo-tiger. follow orbital evolution DWD systems mass ratio $q=0.7$ for tens orbits until after merger to investigate them as a possible origin R Coronae Borealis (RCB) type stars. reproduce previous results, finding that during merger, helium WD donor star is tidally disrupted within 20–80 min since beginning simulation onto...
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution localized areas across expansive domains. Today's supercomputers' extreme heterogeneity presents a significant challenge for dynamically codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger,...
In recent years, computers based on the RISC-V architecture have raised broad interest in high-performance computing (HPC) community. As community develops core instruction set (ISA) along with ISA extensions, HPC has been actively ensuring applications and environments are supported. this context, assessing performance of asynchronous many-task runtime systems (AMT) is essential. paper, we describe our experience porting a full 3D adaptive mesh-refinement, multi-scale, multi-model,...
Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big scenarios, a high-performance clustering approach required. Sparse grid density-based method uses sparse density estimation as its central building block. The underlying enables detection clusters non-convex shapes and without predetermined number clusters. In this work, we introduce new distributed performance-portable variant...
Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses combination HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability broad range heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with vectorization. Therefore, add std::experimental::simd as an option use in Octo-Tiger's kernels alongside SIMD, further new SVE (Scalable Vector Extensions) backend. Additionally, amend...
Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially adaptively refined ones. In Octo-Tiger, an astrophysics application the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks easily distribute work finely overlap communication computation. For computations themselves, use Kokkos turn these into compute kernels capable running on hardware ranging from few CPU cores...