- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Real-Time Systems Scheduling
- Software System Performance and Reliability
- Embedded Systems Design Techniques
- Network Security and Intrusion Detection
- Radiation Effects in Electronics
- Anomaly Detection Techniques and Applications
- Low-power high-performance VLSI design
- Advancements in Semiconductor Devices and Circuit Design
- Protein Structure and Dynamics
- Distributed systems and fault tolerance
- Advanced Software Engineering Methodologies
- Semiconductor materials and devices
- Network Packet Processing and Optimization
- Advanced Chemical Physics Studies
- Software Reliability and Analysis Research
- Computational Drug Discovery Methods
- Innovative Microfluidic and Catalytic Techniques Innovation
- Model-Driven Software Engineering Techniques
- Engineering Applied Research
- IoT and Edge/Fog Computing
- Heat Transfer and Optimization
Cineca
2018-2025
Consorzio Interuniversitario Nazionale per l'Informatica
2022-2024
University of Bologna
2014-2022
University of Michigan
2022
ETH Zurich
2022
Tampere University
2022
Infineon Technologies (Germany)
2022
Queen's University Belfast
2022
University of California, Berkeley
2022
Infineon Technologies (United Kingdom)
2022
In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements.This has increased the urge for programming models capable of effectively leveraging hundreds thousands processors.Task-based parallelism potential provide such capabilities, offering high-level abstractions outline abundant and irregular applications.However, efficiently supporting this paradigm on PMCAs is challenging, due large time space...
Large-scale computing clusters have been the basis of scientific progress for several decades and now become a commodity fuelling AI revolution. Dark Silicon, energy efficiency, power consumption, hot spots are no longer looming threats an Information Communication Technologies (ICT) niche but today limiting factor capability entire human society contributor to global carbon emissions. However, from end user, system administrators, integrator perspective, handling optimising these...
Energy efficiency and datacentre automation are critical targets of the research deployment agenda CINECA its partners in Efficient System Laboratory University Bologna Integrated ETH Zurich. In this manuscript, we present primary outcomes conducted domain under umbrella several European, National Private funding schemes. These consist of: (i) ExaMon scalable, flexible, holistic monitoring framework, which is capable ingesting 70GB/day telemetry data entire link with machine learning...
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double precision...
Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient on large-scale systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of scaling, these features often not available large systems. In addition, achieving higher energy savings requires precise tuning because only applications but also different kernels can have characteristics. We propose SYnergy, a novel...
Manufacturing and environmental variations cause timing errors that are typically avoided by conservative design guardbands or corrected circuit level error detection correction. These measures incur energy performance penalties. This paper considers methods to reduce this cost expanding the scope of variability mitigation through software stack. In particular, we propose workload deployment likelihood in shared memory clusters processor cores. other incorporated a runtime layer OpenMP...
Energy and power consumption are prominent issues in today's supercomputers foreseen as a limiting factor of future installations. In scientific computing, significant amount is spent the communication synchronization-related idle times among distributed processes participating to same application. However, due time scale at which happens, taking advantage low-power states reduce computing resources, may introduce overheads.
Processors for high performance computing and server workload are today thermally constrained.To preserve a safe working temperature, state-of-the-art processors this market segment integrates many cores on the same die feature fine-grain power management thermal feedback loops implemented in hardware.However, to keep control policy simple, these controllers fail taking advantage underlining heterogeneity, long transients specific user mode.In paper, we present self-aware framework making...
Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents toolbox developed in ANTAREX European project autotuning adaptivity energy efficient HPC systems. In particular, modules of are described as well some preliminary results application two target use cases. 1
Power and energy consumption are becoming key challenges for the supercomputers' exascale race. HPC systems' processors waist active power during communication synchronization among MPI processes in large-scale applications. However, due to time scale at which happens, transitioning into low-power states while waiting completion of each may introduce unacceptable overhead. In this article, we present COUNTDOWN, a run-time library identifying automatically reducing CPUs synchronization....
Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultra-conservative multi-corner design margins or corrected error detection recovery mechanisms at the circuit-level. In contrast, we present here runtime software support for cost-effective countermeasures against hardware failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable tightly-coupled shared memory...
Fine-grain time synchronization is important to address several challenges in today and future High Performance Computing (HPC) centers. Among the many, (i) co-scheduling techniques parallel applications with sensitive bulk synchronous workloads, (ii) performance analysis tools (iii) autotuning strategies that want exploit State-of-the-Art (SoA) high resolution monitoring systems, are three examples where of few microseconds required. Previous works report custom solutions reach this without...
In this manuscript we evaluate the impact of HW power capping mechanisms on a real scientific application composed by parallel execution. By comparing mechanism against static frequency allocation schemes show that speed up can be achieved if constraint is enforced in average, during run, instead short time periods. RAPL, which enforces few ms scale, fails sharing budget between more demanding and less phases.
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...
We are entering the era of thermally-bound computing: Advanced and costly cooling solutions needed to sustain high computing densities high-performance equipment. To reduce costs overprovisioning, dynamic thermal management (DTM) strategies aim at controlling device temperature by modulating online performance processing elements. While operating systems allow migration threads between cores, in HPC parallel applications pinned allocated cores start-time avoid job-migration overheads. In...
The power consumption of supercomputers is a major challenge for system owners, users, and society. It limits the capacity installations, it requires large cooling infrastructures, cause carbon footprint. Reducing during application execution without changing source code or increasing time-to-completion highly desirable in real-life high-performance computing scenarios. management run-time frameworks proposed last decade are based on assumption that duration communication phases an MPI can...
Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need parallel programming models capable of effectively leveraging hundreds to thousands processors. Task-based parallelism has potential provide such capabilities, offering flexible support fine-grained irregular parallelism. However, efficiently supporting this paradigm on resource-constrained is challenging task. In paper, we present an...
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...
High-performance computational kernels that optimally exploit modern vector-capable processors are critical in running large-scale drug discovery campaigns efficiently and promptly compatible with the constraints posed by urgent computing needs. Yet, state-of-the-art virtual screening workflows focus either on broadness of features provided to researcher or performance high-throughput accelerators, leaving task deploying efficient CPU compiler. We ported key parts LiGen pipeline, based...
The main limitation of applying predictive tools to large-scale supercomputers is the complexity deploying Artificial Intelligence (AI) services in production and modeling heterogeneous data sources while preserving topological information compact models. This paper proposes GRAAFE, a framework for continuously predicting compute node failures Marconi100 supercomputer. consists (i) an anomaly prediction model based on graph neural networks (GNNs) that leverage nodes' physical layout room...