NFDI4DS | UHH-SEMS - Publication Details

RUAD: Unsupervised anomaly detection in HPC systems

OPENALEX - Publications

Martin Molan Andrea Borghesi Daniele Cesarini Luca Benini Andrea Bartolini

10.1016/j.future.2022.12.001 article EN Future Generation Computer Systems 2022-12-07

Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

OPENALEX - Publications

Giuseppe Tagliavini Daniele Cesarini Andrea Marongiu

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements.This has increased the urge for programming models capable of effectively leveraging hundreds thousands processors.Task-based parallelism potential provide such capabilities, offering high-level abstractions outline abundant and irregular applications.However, efficiently supporting this paradigm on PMCAs is challenging, due large time space...

10.1109/tpds.2018.2814602 article EN IEEE Transactions on Parallel and Distributed Systems 2018-03-12

The REGALE Library: A DDS Interoperability Layer for the HPC PowerStack

OPENALEX - Publications

Giacomo Madella Federico Tesser Lluis Alonso Julita Corbalán Daniele Cesarini and 1 more

Large-scale computing clusters have been the basis of scientific progress for several decades and now become a commodity fuelling AI revolution. Dark Silicon, energy efficiency, power consumption, hot spots are no longer looming threats an Information Communication Technologies (ICT) niche but today limiting factor capability entire human society contributor to global carbon emissions. However, from end user, system administrators, integrator perspective, handling optimising these...

10.3390/jlpea15010010 article EN cc-by Journal of Low Power Electronics and Applications 2025-02-12

Paving the Way Toward Energy-Aware and Automated Datacentre

OPENALEX - Publications

Andrea Bartolini Francesco Beneventi Andrea Borghesi Daniele Cesarini Antonio Libri and 2 more

Energy efficiency and datacentre automation are critical targets of the research deployment agenda CINECA its partners in Efficient System Laboratory University Bologna Integrated ETH Zurich. In this manuscript, we present primary outcomes conducted domain under umbrella several European, National Private funding schemes. These consist of: (i) ExaMon scalable, flexible, holistic monitoring framework, which is capable ingesting 70GB/day telemetry data entire link with machine learning...

10.1145/3339186.3339215 article EN 2019-07-22

Meet Monte Cimone

OPENALEX - Publications

Federico Ficarelli Andrea Bartolini Emanuele Parisi Francesco Beneventi Francesco Barchi and 7 more

The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double precision...

10.1145/3528416.3530869 article EN 2022-05-05

SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving

OPENALEX - Publications

Kaijie Fan Marco D'Antonio Lorenzo Carpentieri Biagio Cosenza Federico Ficarelli and 1 more

Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient on large-scale systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of scaling, these features often not available large systems. In addition, achieving higher energy savings requires precise tuning because only applications but also different kernels can have characteristics. We propose SYnergy, a novel...

10.1145/3581784.3607055 article EN cc-by 2023-11-11

Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters

OPENALEX - Publications

Abbas Rahimi Daniele Cesarini Andrea Marongiu Rajesh K. Gupta Luca Benini

Manufacturing and environmental variations cause timing errors that are typically avoided by conservative design guardbands or corrected circuit level error detection correction. These measures incur energy performance penalties. This paper considers methods to reduce this cost expanding the scope of variability mitigation through software stack. In particular, we propose workload deployment likelihood in shared memory clusters processor cores. other incorporated a runtime layer OpenMP...

10.1145/2744769.2744915 article EN 2015-06-02

COUNTDOWN

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Piero Bonfà Carlo Cavazzoni Luca Benini

Energy and power consumption are prominent issues in today's supercomputers foreseen as a limiting factor of future installations. In scientific computing, significant amount is spent the communication synchronization-related idle times among distributed processes participating to same application. However, due time scale at which happens, taking advantage low-power states reduce computing resources, may introduce overheads.

10.1145/3295816.3295818 article EN 2018-11-04

GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems

OPENALEX - Publications

Martin Molan Mohsen Seyedkazemi Ardebili Junaid Ahmed Khan Francesco Beneventi Daniele Cesarini and 2 more

10.1016/j.future.2024.06.032 article EN Future Generation Computer Systems 2024-06-21

Self-Aware Thermal Management for High-Performance Computing Processors

OPENALEX - Publications

Andrea Bartolini Roberto Diversi Daniele Cesarini Francesco Beneventi

Processors for high performance computing and server workload are today thermally constrained.To preserve a safe working temperature, state-of-the-art processors this market segment integrates many cores on the same die feature fine-grain power management thermal feedback loops implemented in hardware.However, to keep control policy simple, these controllers fail taking advantage underlining heterogeneity, long transients specific user mode.In paper, we present self-aware framework making...

10.1109/mdat.2017.2774774 article EN IEEE Design and Test 2017-11-16

Autotuning and adaptivity in energy efficient HPC systems

OPENALEX - Publications

Cristina Silvano Gianluca Palermo Giovanni Agosta Amir H. Ashouri Davide Gadioli and 20 more

Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents toolbox developed in ANTAREX European project autotuning adaptivity energy efficient HPC systems. In particular, modules of are described as well some preliminary results application two target use cases. 1

10.1145/3203217.3205338 preprint EN 2018-05-08

The ANTAREX domain specific language for high performance computing

OPENALEX - Publications

Cristina Silvano Giovanni Agosta Andrea Bartolini Andrea R. Beccari Luca Benini and 17 more

10.1016/j.micpro.2019.05.005 article EN Microprocessors and Microsystems 2019-05-14

COUNTDOWN: A Run-Time Library for Performance-Neutral Energy Saving in MPI Applications

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Pietro Bonfà Carlo Cavazzoni Luca Benini

Power and energy consumption are becoming key challenges for the supercomputers' exascale race. HPC systems' processors waist active power during communication synchronization among MPI processes in large-scale applications. However, due to time scale at which happens, transitioning into low-power states while waiting completion of each may introduce unacceptable overhead. In this article, we present COUNTDOWN, a run-time library identifying automatically reducing CPUs synchronization....

10.1109/tc.2020.2995269 article EN IEEE Transactions on Computers 2020-05-18

Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters

OPENALEX - Publications

Abbas Rahimi Daniele Cesarini Andrea Marongiu Rajesh K. Gupta Luca Benini

Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultra-conservative multi-corner design margins or corrected error detection recovery mechanisms at the circuit-level. In contrast, we present here runtime software support for cost-effective countermeasures against hardware failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable tightly-coupled shared memory...

10.1109/jetcas.2014.2315883 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2014-04-18

Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters

OPENALEX - Publications

Antonio Libri Andrea Bartolini Daniele Cesarini Luca Benini

Fine-grain time synchronization is important to address several challenges in today and future High Performance Computing (HPC) centers. Among the many, (i) co-scheduling techniques parallel applications with sensitive bulk synchronous workloads, (ii) performance analysis tools (iii) autotuning strategies that want exploit State-of-the-Art (SoA) high resolution monitoring systems, are three examples where of few microseconds required. Previous works report custom solutions reach this without...

10.1145/3295816.3295819 article EN 2018-11-04

Benefits in Relaxing the Power Capping Constraint

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Luca Benini

In this manuscript we evaluate the impact of HW power capping mechanisms on a real scientific application composed by parallel execution. By comparing mechanism against static frequency allocation schemes show that speed up can be achieved if constraint is enforced in average, during run, instead short time periods. RAPL, which enforces few ms scale, fails sharing budget between more demanding and less phases.

10.1145/3152821.3152878 article EN 2017-09-09

Monte Cimone: Paving the Road for the First Generation of RISC-V High-Performance Computers

OPENALEX - Publications

Andrea Bartolini Federico Ficarelli Emanuele Parisi Francesco Beneventi Francesco Barchi and 7 more

The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...

10.1109/socc56010.2022.9908096 article EN 2022-09-05

Prediction horizon vs. efficiency of optimal dynamic thermal control policies in HPC nodes

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Luca Benini

We are entering the era of thermally-bound computing: Advanced and costly cooling solutions needed to sustain high computing densities high-performance equipment. To reduce costs overprovisioning, dynamic thermal management (DTM) strategies aim at controlling device temperature by modulating online performance processing elements. While operating systems allow migration threads between cores, in HPC parallel applications pinned allocated cores start-time avoid job-migration overheads. In...

10.1109/vlsi-soc.2017.8203471 article EN 2017-10-01

Countdown Slack: A Run-Time Library to Reduce Energy Footprint in Large-Scale MPI Applications

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Andrea Borghesi Carlo Cavazzoni Mathieu Luisier and 1 more

The power consumption of supercomputers is a major challenge for system owners, users, and society. It limits the capacity installations, it requires large cooling infrastructures, cause carbon footprint. Reducing during application execution without changing source code or increasing time-to-completion highly desirable in real-life high-performance computing scenarios. management run-time frameworks proposed last decade are based on assumption that duration communication phases an MPI can...

10.1109/tpds.2020.3000418 article EN IEEE Transactions on Parallel and Distributed Systems 2020-06-05

An Optimized Task-Based Runtime System for Resource-Constrained Parallel Accelerators

OPENALEX - Publications

Daniele Cesarini Andrea Marongiu Luca Benini

Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need parallel programming models capable of effectively leveraging hundreds to thousands processors. Task-based parallelism has potential provide such capabilities, offering flexible support fine-grained irregular parallelism. However, efficiently supporting this paradigm on resource-constrained is challenging task. In paper, we present an...

10.3850/9783981537079_0607 article EN 2016-01-01

Monte Cimone: Paving the Road for the First Generation of RISC-V High-Performance Computers

OPENALEX - Publications

Andrea Bartolini Federico Ficarelli Emanuele Parisi Francesco Beneventi Francesco Barchi and 7 more

The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...

10.48550/arxiv.2205.03725 preprint EN cc-by arXiv (Cornell University) 2022-01-01

High-throughput drug discovery on the Fujitsu A64FX architecture

OPENALEX - Publications

Filippo Barbari Federico Ficarelli Daniele Cesarini

High-performance computational kernels that optimally exploit modern vector-capable processors are critical in running large-scale drug discovery campaigns efficiently and promptly compatible with the constraints posed by urgent computing needs. Yet, state-of-the-art virtual screening workflows focus either on broadness of features provided to researcher or performance high-throughput accelerators, leaving task deploying efficient CPU compiler. We ported key parts LiGen pipeline, based...

10.1145/3636480.3637095 article EN 2024-01-08

Graafe: Graph Anomaly Anticipation Framework for Exascale Hpc Systems

OPENALEX - Publications

Martin Molan Mohsen Seyedkazemi Ardebili Junaid Ahmed Khan Francesco Beneventi Daniele Cesarini and 2 more

The main limitation of applying predictive tools to large-scale supercomputers is the complexity deploying Artificial Intelligence (AI) services in production and modeling heterogeneous data sources while preserving topological information compact models. This paper proposes GRAAFE, a framework for continuously predicting compute node failures Marconi100 supercomputer. consists (i) an anomaly prediction model based on graph neural networks (GNNs) that leverage nodes' physical layout room...

10.2139/ssrn.4713330 preprint EN 2024-01-01

Energy Saving and Thermal Management Opportunities in a Workload-Aware MPI Runtime for a Scientific HPC Computing Node

OPENALEX - Publications

Daniele Cesarini Andrea Bartolini Luca Benini

10.3929/ethz-b-000313829 article EN Parallel Computing 2017-01-01