Andrea Marongiu

ORCID: 0000-0003-1010-4762
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Interconnection Networks and Systems
  • Distributed and Parallel Computing Systems
  • Real-Time Systems Scheduling
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Advanced Battery Technologies Research
  • Distributed systems and fault tolerance
  • Advancements in Battery Materials
  • Low-power high-performance VLSI design
  • Radiation Effects in Electronics
  • Advanced Battery Materials and Technologies
  • Advanced Memory and Neural Computing
  • Robotics and Sensor-Based Localization
  • Semiconductor materials and devices
  • Numerical Methods and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Robotic Path Planning Algorithms
  • Real-time simulation and control systems
  • 3D IC and TSV technologies
  • CCD and CMOS Imaging Sensors
  • Digital Filter Design and Implementation
  • Advancements in Semiconductor Devices and Circuit Design
  • Advanced Image and Video Retrieval Techniques

University of Modena and Reggio Emilia
2015-2024

University of Bologna
2011-2022

ETH Zurich
2014-2022

Jülich Aachen Research Alliance
2013-2017

RWTH Aachen University
2013-2017

École Polytechnique Fédérale de Lausanne
2014-2016

Laboratori Guglielmo Marconi (Italy)
2016

University of Cagliari
2002-2010

In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to energy consumption compute-intensive applications with large dynamic range. Experimental evidence shows that 50% consumed by core and its data memory is related FP computations. The adoption formats requiring lower number bits an interesting opportunity reduce consumption, since it allows simplify arithmetic circuitry bandwidth required transfer between registers enabling...

10.23919/date.2018.8342167 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Over the last few years, ever-increasing use of Graphic Processing Units (GPUs) in safety-related domains has opened up many research problems real-time community. The closed and proprietary nature scheduling mechanisms deployed NVIDIA GPUs, for instance, represents a major obstacle deriving proper schedulability analysis latency-sensitive applications. Existing literature addresses these issues by either (i) providing simplified models heterogeneous CPUGPU systems their associated policies,...

10.1109/rtas48715.2020.000-5 article EN 2020-04-01

Most of today's state-of-the-art processors for mobile and embedded systems feature on-chip scratchpad memories. To efficiently exploit the advantages low-latency high-bandwidth memory modules in hierarchy, there is need programming models and/or language features that expose such architectural details. On other hand, effectively exploiting limited space requires programmer to devise an efficient partitioning distributed placement shared data at application level. In this paper, we propose a...

10.1109/tc.2010.199 article EN IEEE Transactions on Computers 2010-10-19

OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite lack of any notion real-time execution, latest specification (v4.0) introduces a tasking model that resembles way applications are modeled and designed, i.e., as set periodic task graphs. This makes OpenMP4 convenient candidate to be adopted in future systems. However, incorporates well features guarantee backward compatibility with previous versions limit its practical usability The most...

10.1109/cases.2015.7324556 article EN 2015-10-01

OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite lack of any notion real-time execution, latest specification (v4.0) introduces a tasking model that resembles way applications are modeled and designed, i.e., as set periodic task graphs. This makes OpenMP4 convenient candidate to be adopted in future systems. However, incorporates well features guarantee backward compatibility with previous versions limit its practical usability The most...

10.5555/2830689.2830709 article EN Compilers, Architecture, and Synthesis for Embedded Systems 2015-10-04

Driven by flexibility, performance and cost constraints of demanding modern applications, heterogeneous System-on-Chip (SoC) is the dominant design paradigm in embedded system computing domain. SoC architecture heterogeneity clearly provide a wider power/performance scaling, combining high power efficient general-purpose cores along with massively parallel many-core-based accelerators. Besides complex hardware, generally these kinds platforms host also an advanced software ecosystem,...

10.1109/ipdpsw.2013.177 article EN 2013-05-01

This paper aims to take stock of recent advances in the field energy-quality (EQ) scalable circuits and systems, as promising direction continue historical exponential energy downscaling under diminished returns from technology voltage scaling. EQ-scalable systems explicitly trade off quality at different levels abstraction sub-systems, dealing with "quality" an explicit design requirement, reducing whenever application, task, or dataset allow degradation (e.g., vision machine learning). A...

10.1109/jetcas.2018.2881461 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2018-11-15

Ultra-low power computing is a key enabler of deeply embedded platforms used in domains such as distributed sensing, internet things, wearable computing. The rising computational demands and high dynamic target algorithms often call for hardware support floating-point (FP) arithmetic system energy efficiency. In light transprecision computing, where accuracy data consciously changed during the execution applications, custom FP types are being to optimize wide range problems. We two - one 16...

10.1109/iscas.2018.8351816 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-01-01

Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies such will to promote predictability as first-class design constraint, along with features massive parallelism exploitation. OpenMP, increasingly adopted in systems domain, has recently evolved deal programmability heterogeneous many-cores, mature support fine-grained task parallelism. While tasking is potentially very convenient...

10.5555/2755753.2755893 article EN Design, Automation, and Test in Europe 2015-03-09

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements.This has increased the urge for programming models capable of effectively leveraging hundreds thousands processors.Task-based parallelism potential provide such capabilities, offering high-level abstractions outline abundant and irregular applications.However, efficiently supporting this paradigm on PMCAs is challenging, due large time space...

10.1109/tpds.2018.2814602 article EN IEEE Transactions on Parallel and Distributed Systems 2018-03-12

In recent years approximate computing has been extensively explored as a paradigm to design hardware and software solutions that save energy by trading off on the quality of computed results. applications involve numerical computations with wide dynamic range, precision tuning floating-point (FP) variables is key knob leverage energy/quality tradeoff program This aspect assumes maximum relevance in transprecision scenario, where accuracy data tuned at fine grain application code. Performing...

10.1109/tcad.2018.2883902 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018-12-04

Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies such will to promote predictability as first-class design constraint, along with features massive parallelism exploitation. OpenMP, increasingly adopted in systems domain, has recently evolved deal programmability heterogeneous many-cores, mature support fine-grained task parallelism. While tasking is potentially very convenient...

10.7873/date.2015.0778 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2015-01-01

The deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attractive, as it reduces the cost and time-to-market new products. Most modern high-end embedded SoCs rely a heterogeneous design, coupling general-purpose multi-core CPU to massively parallel accelerator, typically programmable GPU, sharing single global DRAM. However, because non-predictable arbiters designed maximize average or peak performance, very difficult provide timing guarantees such systems. In...

10.23919/date.2017.7927008 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2017-03-01

10.1016/j.micpro.2011.08.010 article EN Microprocessors and Microsystems 2011-08-25

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due the challenges of building prototyping platform that unites an industry-standard open PMCA architecture. In this work we introduce HERO, FPGA-based combines composed clusters...

10.48550/arxiv.1712.06497 preprint EN other-oa arXiv (Cornell University) 2017-01-01

In this paper a study and an experimental analysis on lithium iron phosphate battery under different operating conditions is reported in order to investigate its potential application electric vehicles hybrid vehicles. The of unloading loading characteristics the energetic storage process efficiency have been developed. Unloading characteristics, temperature sensitivity range −15° C +50° determined. To evaluate dynamic performance for vehicle typical load variations test has conducted.

10.1109/isie.2010.5637749 article EN 2010-07-01

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used -- with a crossbar-like medium inside each cluster and network-on-chip (NoC) at the global level which make operations non-uniform (NUMA). Nested parallelism represents powerful programming abstraction for these architectures, where first can be to distribute coarse-grained tasks clusters, additional levels fine-grained distributed...

10.5555/2492708.2492734 article EN Design, Automation, and Test in Europe 2012-03-12
Coming Soon ...