- Computational Physics and Python Applications
- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Advancements in Semiconductor Devices and Circuit Design
- Distributed and Parallel Computing Systems
- Advanced Computational Techniques and Applications
- Image Processing and 3D Reconstruction
- Low-power high-performance VLSI design
- Anomaly Detection Techniques and Applications
- 3D IC and TSV technologies
- Network Security and Intrusion Detection
- Cloud Computing and Resource Management
- Ferroelectric and Negative Capacitance Devices
- Thermal properties of materials
- Advanced Memory and Neural Computing
- Advancements in Photolithography Techniques
- Semiconductor materials and devices
- Embedded Systems Design Techniques
- Software System Performance and Reliability
- Particle accelerators and beam dynamics
- Stochastic Gradient Optimization Techniques
- Heat Transfer and Optimization
- Model Reduction and Neural Networks
- Air Quality Monitoring and Forecasting
- VLSI and FPGA Design Techniques
University of Bologna
2014-2024
University of Michigan
2022
ETH Zurich
2022
Tampere University
2022
Infineon Technologies (Germany)
2022
Queen's University Belfast
2022
University of California, Berkeley
2022
Infineon Technologies (United Kingdom)
2022
Laboratori Guglielmo Marconi (Italy)
2021
Columbia University
2013
Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case budgeting becoming unaffordable. Reactive management, which sets temperature thresholds to trigger capping actions, too “near-sighted,” and it may lead severe performance degradation overshoots. More aggressive proactive managements minimize penalty with smooth optimal control. These techniques require knowledge of models, have be accurate simple make the controls effective, while keeping...
Exascale computing represents the next leap in HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization an essential tool design process control efficient, reliable thermally constrained systems. However, domain, model learning techniques tailored specific supercomputer require real measurements must therefore handle analyze a amount data coming from...
Supercomputers are the most powerful computing machines available to society. They play a central role in economic, industrial, and societal development. While they used by scientists, engineers, decision-makers, data-analyst computationally solve complex problems, supercomputers their hosting datacenters themselves power-hungry systems. Improving efficiency, availability, resiliency is vital subject of many research engineering efforts. Still, major roadblock hinders researchers: dearth...
Energy efficiency and datacentre automation are critical targets of the research deployment agenda CINECA its partners in Efficient System Laboratory University Bologna Integrated ETH Zurich. In this manuscript, we present primary outcomes conducted domain under umbrella several European, National Private funding schemes. These consist of: (i) ExaMon scalable, flexible, holistic monitoring framework, which is capable ingesting 70GB/day telemetry data entire link with machine learning...
Decentralised Machine Learning (DML) enables collaborative machine learning without centralised input data. Federated (FL) and Edge Inference are examples of DML. While tools for DML (especially FL) starting to flourish, many not flexible portable enough experiment with novel processors (e.g., RISC-V), non-fully connected network topologies, asynchronous collaboration schemes. We overcome these limitations via a domain-specific language allowing us map schemes an underlying middleware, i.e....
The thermal wall for many-core systems on-chip calls advanced management techniques to maximize performance, while capping temperatures. Distributed and compact models are a cornerstone such techniques. System identification methodologies allow extract directly from the target device response. Unfortunately, standard Auto-Regressive eXogenous Least Squares cannot effectively tackle both model approximation measurement noise typical of real systems. In this work, we propose novel distributed...
In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering PRACE (Partnership Advanced Europe). is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100 Gb/s networking) plus custom hardware system middleware software. features (i) a dedicated power monitor interface, around the...
Self-heating and high-operating temperature are major concerns in 3-D-chip integration. In this paper, we leverage a 3-D test chip (WideIO dynamic random access memory on top of logic die) equipped with sensors heaters to explore thermal effects develop advanced modeling strategies suitable for complex 3-D-stacked circuits. We correlate measurements the power dissipated by using model learning techniques. Moreover, defined basis function obtained data available from on-chip sensors. This can...
On the race toward exascale supercomputing systems are facing important challenges which limit efficiency of system. Among all, power and energy consumption fueled by end Dennard's scaling start to show their impact on limiting supercomputers peak performance cost effectiveness.
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double precision...
Energy-efficiency is of primary interest in future HPC systems as their computational growth limited by the supercomputer peak power consumption. A significant part consumed a machine caused cooling infrastructure. Todays thermal design based on coarse grain models which consider silicon die processing elements an isothermal surface. Similarly feedback control loops uses same assumption to modulate effort with goal reducing cost and maintaining temperature safe working range. Recent...
Processors for high performance computing and server workload are today thermally constrained.To preserve a safe working temperature, state-of-the-art processors this market segment integrates many cores on the same die feature fine-grain power management thermal feedback loops implemented in hardware.However, to keep control policy simple, these controllers fail taking advantage underlining heterogeneity, long transients specific user mode.In paper, we present self-aware framework making...
Compact thermal models and modeling strategies are today a cornerstone for advanced power management to counteract the emerging crisis many-core systems-on-chip. System identification techniques allow extract directly from target device response. Unfortunately, standard Least Squares cannot effectively cope with both model approximation measurement noise typical of real systems. In this work, we present novel distributed strategy capable coping real-life temperature sensor extracting set...
Compact thermal models and modeling strategies are today a cornerstone for advanced power management to counteract the emerging crisis many-core systems-on-chip. System identification techniques allow extract directly from target device response. Unfortunately, standard Least Squares cannot effectively cope with both model approximation measurement noise typical of real systems. In this work, we present novel distributed strategy capable coping real-life temperature sensor extracting set...
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...
Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case budgeting becoming unaffordable. Reactive management, which sets temperature thresholds to trigger capping actions, too "near-sighted", and it may lead severe performance degradation overshoots. More aggressive proactive minimizes penalty with smooth optimal control, but requires the knowledge of system models be precise. Unfortunately, in practice these are not provided by equipment...
Distributed and compact thermal models are at the basis of thermal-aware design on-line optimization cooling effort in future High-Performance Computing systems. These can be directly extracted from target device's response by means system identification techniques. This paper proposes a novel approach for real-life production HPC Our is capable extracting MISO supercomputing node deployment scenario affected quantization noise on temperature measurements as well operating free-cooling, with...
Dynamic thermal management (DTM) is a key technology for future many-core systems. Indeed systems, as both server-class and embedded chip multiprocessors are thermally constrained. DTM design requires consideration the chain of interactions between HW operating points, workload phases, power consumption, die temperature, monitor infrastructure, control policy. Hugely different time scales involved, from microseconds to hours. Simulating performance solutions system in reasonable an open...
Power and thermal design management are critical components of high performance computing (HPC) systems, due to their cutting-edge position in terms power density large total consumption. Many HPC strategies rely on the availability accurate compact models, capable predicting consumption tracking its sensitivity workload parameters operating points. In this paper we describe a methodology framework for training models derived with two best-in-class procedures directly online production nodes...
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance processors accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects full software stack. In this paper, we describe Monte Cimone, fully-operational multi-blade computer prototype hardware-software test-bed based on U740, double-precision...
High temperature is one of the limiting factors and major concerns in 3D-chip integration. In this paper we use a 3D test chip (WIDEIO DRAM on top logic die) equipped with sensors heaters to explore thermal effects. We correlated real measurements power dissipated by using model learning techniques. The resulting compact able predict temperatures at locations far from infer dissipation any location chip. Results are verified mean an off-sample validation technique show high accuracy when...
Datacenters play a vital role in today's society. At large, datacenter room is complex controlled environment composed of thousands computing nodes, which consume kW power. To dissipate the power, forced air/liquid flow employed, with cost millions euros per year. Reducing this involves using free-cooling and average case design, can create cooling shortage thermal hazards. When hazard happens, system administrators facility manager must stop production to avoid IT equipment damage wear-out....