- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Real-Time Systems Scheduling
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Advanced Battery Technologies Research
- Distributed systems and fault tolerance
- Advancements in Battery Materials
- Low-power high-performance VLSI design
- Radiation Effects in Electronics
- Advanced Battery Materials and Technologies
- Advanced Memory and Neural Computing
- Robotics and Sensor-Based Localization
- Semiconductor materials and devices
- Numerical Methods and Algorithms
- Ferroelectric and Negative Capacitance Devices
- Robotic Path Planning Algorithms
- Real-time simulation and control systems
- 3D IC and TSV technologies
- CCD and CMOS Imaging Sensors
- Digital Filter Design and Implementation
- Advancements in Semiconductor Devices and Circuit Design
- Advanced Image and Video Retrieval Techniques
University of Modena and Reggio Emilia
2015-2024
University of Bologna
2011-2022
ETH Zurich
2014-2022
Jülich Aachen Research Alliance
2013-2017
RWTH Aachen University
2013-2017
École Polytechnique Fédérale de Lausanne
2014-2016
Laboratori Guglielmo Marconi (Italy)
2016
University of Cagliari
2002-2010
This article consists of a collection slides from the authors' conference presentation.
In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to energy consumption compute-intensive applications with large dynamic range. Experimental evidence shows that 50% consumed by core and its data memory is related FP computations. The adoption formats requiring lower number bits an interesting opportunity reduce consumption, since it allows simplify arithmetic circuitry bandwidth required transfer between registers enabling...
Over the last few years, ever-increasing use of Graphic Processing Units (GPUs) in safety-related domains has opened up many research problems real-time community. The closed and proprietary nature scheduling mechanisms deployed NVIDIA GPUs, for instance, represents a major obstacle deriving proper schedulability analysis latency-sensitive applications. Existing literature addresses these issues by either (i) providing simplified models heterogeneous CPUGPU systems their associated policies,...
Most of today's state-of-the-art processors for mobile and embedded systems feature on-chip scratchpad memories. To efficiently exploit the advantages low-latency high-bandwidth memory modules in hierarchy, there is need programming models and/or language features that expose such architectural details. On other hand, effectively exploiting limited space requires programmer to devise an efficient partitioning distributed placement shared data at application level. In this paper, we propose a...
OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite lack of any notion real-time execution, latest specification (v4.0) introduces a tasking model that resembles way applications are modeled and designed, i.e., as set periodic task graphs. This makes OpenMP4 convenient candidate to be adopted in future systems. However, incorporates well features guarantee backward compatibility with previous versions limit its practical usability The most...
OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite lack of any notion real-time execution, latest specification (v4.0) introduces a tasking model that resembles way applications are modeled and designed, i.e., as set periodic task graphs. This makes OpenMP4 convenient candidate to be adopted in future systems. However, incorporates well features guarantee backward compatibility with previous versions limit its practical usability The most...
Driven by flexibility, performance and cost constraints of demanding modern applications, heterogeneous System-on-Chip (SoC) is the dominant design paradigm in embedded system computing domain. SoC architecture heterogeneity clearly provide a wider power/performance scaling, combining high power efficient general-purpose cores along with massively parallel many-core-based accelerators. Besides complex hardware, generally these kinds platforms host also an advanced software ecosystem,...
This paper aims to take stock of recent advances in the field energy-quality (EQ) scalable circuits and systems, as promising direction continue historical exponential energy downscaling under diminished returns from technology voltage scaling. EQ-scalable systems explicitly trade off quality at different levels abstraction sub-systems, dealing with "quality" an explicit design requirement, reducing whenever application, task, or dataset allow degradation (e.g., vision machine learning). A...
Ultra-low power computing is a key enabler of deeply embedded platforms used in domains such as distributed sensing, internet things, wearable computing. The rising computational demands and high dynamic target algorithms often call for hardware support floating-point (FP) arithmetic system energy efficiency. In light transprecision computing, where accuracy data consciously changed during the execution applications, custom FP types are being to optimize wide range problems. We two - one 16...
Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies such will to promote predictability as first-class design constraint, along with features massive parallelism exploitation. OpenMP, increasingly adopted in systems domain, has recently evolved deal programmability heterogeneous many-cores, mature support fine-grained task parallelism. While tasking is potentially very convenient...
In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements.This has increased the urge for programming models capable of effectively leveraging hundreds thousands processors.Task-based parallelism potential provide such capabilities, offering high-level abstractions outline abundant and irregular applications.However, efficiently supporting this paradigm on PMCAs is challenging, due large time space...
In recent years approximate computing has been extensively explored as a paradigm to design hardware and software solutions that save energy by trading off on the quality of computed results. applications involve numerical computations with wide dynamic range, precision tuning floating-point (FP) variables is key knob leverage energy/quality tradeoff program This aspect assumes maximum relevance in transprecision scenario, where accuracy data tuned at fine grain application code. Performing...
Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies such will to promote predictability as first-class design constraint, along with features massive parallelism exploitation. OpenMP, increasingly adopted in systems domain, has recently evolved deal programmability heterogeneous many-cores, mature support fine-grained task parallelism. While tasking is potentially very convenient...
The deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attractive, as it reduces the cost and time-to-market new products. Most modern high-end embedded SoCs rely a heterogeneous design, coupling general-purpose multi-core CPU to massively parallel accelerator, typically programmable GPU, sharing single global DRAM. However, because non-predictable arbiters designed maximize average or peak performance, very difficult provide timing guarantees such systems. In...
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due the challenges of building prototyping platform that unites an industry-standard open PMCA architecture. In this work we introduce HERO, FPGA-based combines composed clusters...
In this paper a study and an experimental analysis on lithium iron phosphate battery under different operating conditions is reported in order to investigate its potential application electric vehicles hybrid vehicles. The of unloading loading characteristics the energetic storage process efficiency have been developed. Unloading characteristics, temperature sensitivity range −15° C +50° determined. To evaluate dynamic performance for vehicle typical load variations test has conducted.
Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used -- with a crossbar-like medium inside each cluster and network-on-chip (NoC) at the global level which make operations non-uniform (NUMA). Nested parallelism represents powerful programming abstraction for these architectures, where first can be to distribute coarse-grained tasks clusters, additional levels fine-grained distributed...