NFDI4DS | UHH-SEMS - Publication Details

Tackling the Matrix Multiplication Micro-Kernel Generation with Exo

OPENALEX - Publications

Adrián Castelló Julian Bellavita Grace Dinh Yuka Ikarashi Héctor Martínez

The optimization of the matrix multiplication (or GEMM) has been a need during last decades. This operation is considered flagship current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because its widespread use in large variety scientific applications. GEMM usually implemented following GotoBLAS philosophy, which tiles operands and uses series nested loops for performance improvement. These approaches extract maximum computational power architectures through small pieces...

10.1109/cgo57630.2024.10444883 article EN 2024-02-28

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra

OPENALEX - Publications

Julian Bellavita Thomas Pasquali Laura Río-Martín Flavio Vella Giulia Guidi

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of its inability to identify non-linearly separable clusters, which may lead inaccurate solutions certain cases. Kernel variant classical that can find clusters. However, it scales quadratically respect the size dataset, taking several minutes cluster even medium-sized datasets on traditional CPU-based machines. In this paper, we present formulation using...

10.1145/3710848.3710887 preprint EN 2025-02-28

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

OPENALEX - Publications

Thomas McFarland Julian Bellavita Giulia Guidi

10.1145/3676151.3719365 article EN 2025-05-03

symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver

OPENALEX - Publications

Julian Bellavita Mathias Jacquelin Esmond Ng Dan Bonachea Johnny Corbino and 1 more

Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method choice for solving such linear systems. Therefore, development parallel codes that can efficiently run on today's large-scale heterogeneous distributed-memory platforms vital importance. Modern supercomputers offer nodes contain a mix CPUs GPUs. To fully utilize computing power these nodes, must be adapted to offload expensive...

10.1145/3624062.3624600 article EN cc-by 2023-11-10

Tackling the Matrix Multiplication Micro-kernel Generation with Exo

OPENALEX - Publications

Adrián Castelló Julian Bellavita Grace Dinh Yuka Ikarashi Héctor Martínez

The optimization of the matrix multiplication (or GEMM) has been a need during last decades. This operation is considered flagship current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because its widespread use in large variety scientific applications. GEMM usually implemented following GotoBLAS philosophy, which tiles operands and uses series nested loops for performance improvement. These approaches extract maximum computational power architectures through small pieces...

10.48550/arxiv.2310.17408 preprint EN cc-by arXiv (Cornell University) 2023-01-01