NFDI4DS | UHH-SEMS - Publication Details

A transprecision floating-point platform for ultra-low power computing

OPENALEX - Publications

Giuseppe Tagliavini Stefan Mach Davide Rossi Andrea Marongiu Luca Benini

In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to energy consumption compute-intensive applications with large dynamic range. Experimental evidence shows that 50% consumed by core and its data memory is related FP computations. The adoption formats requiring lower number bits an interesting opportunity reduce consumption, since it allows simplify arithmetic circuitry bandwidth required transfer between registers enabling...

10.23919/date.2018.8342167 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing

OPENALEX - Publications

Stefan Mach Fabian Schuiki Florian Zaruba Luca Benini

The slowdown of Moore's law and the power wall necessitates a shift toward finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU), supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures, extend...

10.1109/tvlsi.2020.3044752 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-12-30

Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

OPENALEX - Publications

Davide Rossi Francesco Conti Manuel Eggimann Alfio Di Mauro Giuseppe Tagliavini and 7 more

The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from 1.7 $\mathrm{\mu}$W fully retentive cognitive sleep mode up 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB state-retentive SRAM,...

10.1109/jssc.2021.3114881 article EN IEEE Journal of Solid-State Circuits 2021-10-07

4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

OPENALEX - Publications

Davide Rossi Francesco Conti Manuel Eggiman Stefan Mach Alfio Di Mauro and 7 more

The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 μW fully retentive COGNITIVE sleep mode up 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB state- SRAM, 4MB non-volatile MRAM. To...

10.1109/isscc42613.2021.9365939 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing

OPENALEX - Publications

Stefan Mach Davide Rossi Giuseppe Tagliavini Andrea Marongiu Luca Benini

Ultra-low power computing is a key enabler of deeply embedded platforms used in domains such as distributed sensing, internet things, wearable computing. The rising computational demands and high dynamic target algorithms often call for hardware support floating-point (FP) arithmetic system energy efficiency. In light transprecision computing, where accuracy data consciously changed during the execution applications, custom FP types are being to optimize wide range problems. We two - one 16...

10.1109/iscas.2018.8351816 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-01-01

Design and Evaluation of SmallFloat SIMD extensions to the RISC-V ISA

OPENALEX - Publications

Giuseppe Tagliavini Stefan Mach Davide Rossi Andrea Marongiu Luca Benini

RISC-V is an open-source instruction set architecture (ISA) with a modular design consisting of mandatory base part plus optional extensions. The 32IMFC ISA configuration has been widely adopted for the new-generation, low-power processors. Motivated by important energy savings that smaller-than-32-bit FP types have enabled in several application domains and related compute platforms, some recent studies published encouraging early results their adoption In this paper we introduce extensions...

10.23919/date.2019.8714897 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2019-03-01

An 826 MOPS, 210uW/MHz Unum ALU in 65 nm

OPENALEX - Publications

Florian Glaser Stefan Mach Abbas Rahimi Frank K. Gürkaynak Qiuting Huang and 1 more

To overcome the limitations of conventional floating-point number formats, an interval arithmetic and variable-width storage format called universal (unum) has been recently introduced [1]. This paper presents first (to best our knowledge) silicon implementation measurements application-specific integrated circuit (ASIC) for unum arithmetic. The designed chip includes a 128-bit wide unit to execute additions subtractions, while also supporting lossless (for intermediate results) lossy...

10.1109/iscas.2018.8351546 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-05-01

A RISC-V Based Open Hardware Platform for Always-On Wearable Smart Sensing

OPENALEX - Publications

Manuel Eggimann Stefan Mach Michele Magno Luca Benini

We present a fully programmable ultra-low-power embedded platform that hosts an "electronic skin" (E-skin) arrays of tactile sensors with up to 64 channels, ECG/EMG 8 inertial sensors, and Bluetooth Low Energy 5.0 module. The platform's compute engine is heterogeneous multi-core parallel ultra-low power (PULP) processor based on RISC-V, capable delivering 2.5 GOPS, within 55 mW consumption envelope, which makes the ideal for battery-powered always-on operation. Experimental results show peak...

10.1109/iwasi.2019.8791364 article EN 2019-06-01

A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics

OPENALEX - Publications

Fabio Montagna Stefan Mach Simone Benatti Angelo Garofalo Gianmarco Ottavi and 3 more

Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...

10.1109/tpds.2021.3101764 article EN IEEE Transactions on Parallel and Distributed Systems 2021-08-04

MiniFloats on RISC-V Cores: ISA Extensions with Mixed-Precision Short Dot Products

OPENALEX - Publications

Luca Bertaccini Gianna Paulin Matheus Cavalcante Tim Fischer Stefan Mach and 1 more

10.1109/tetc.2024.3365354 article EN IEEE Transactions on Emerging Topics in Computing 2024-02-19

MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V Cores

OPENALEX - Publications

Luca Bertaccini Gianna Paulin Tim Fischer Stefan Mach Luca Benini

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of NN models improving energy efficiency underlying hardware architectures. Narrow integer data types been vastly investigated for successfully pushed to extreme ternary binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision such as 8-bit FP mixed-precision techniques only...

10.1109/arith54963.2022.00010 article EN 2022-09-01

11.3 Metis AIPU: A 12nm 15TOPS/W 209.6TOPS SoC for Cost- and Energy-Efficient Inference at the Edge

OPENALEX - Publications

Pascal Alexander Hager Bert Moons Stefan Cosemans Ioannis A. Papistas Bram Rooseleer and 43 more

The Metis AI Processing Unit (AIPU) is a quad-core System-on-Chip (SoC) designed for edge inference, executing all components of an workload on-chip. AIPU exhibits performance 52.4 TOPS per core, and compound throughput 209.6 TOPS. Key features the its integration into PCIe card-based system are shown in Fig. 11.3.1. leverages benefits from quantized digital in-memory computing (D-IMC) architecture — with 8b weights, activations, full-precision accumulation to decrease both memory cost...

10.1109/isscc49657.2024.10454395 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

A 0.80pJ/flop, 1.24Tflop/sW 8-to-64 bit Transprecision Floating-Point Unit for a 64 bit RISC-V Processor in 22nm FD-SOI

OPENALEX - Publications

Stefan Mach Fabian Schuiki Florian Zaruba Luca Benini

The crisis of Moore's law and new dominant Machine Learning workloads require a paradigm shift towards finely tunable-precision (a.k.a. transprecision) computing. More specifically, we need floating-point circuits that are capable to operate on many formats with high flexibility. We present the first silicon implementation 64-bit transprecision unit. It fully supports standard double, single, half precision, alongside custom bfloat 8 bit formats. Operations occur scalars or 2, 4, 8-way SIMD...

10.1109/vlsi-soc.2019.8920307 article EN 2019-10-01

The Floating Point Trinity: A Multi-modal Approach to Extreme Energy-Efficiency and Performance

OPENALEX - Publications

Florian Zaruba Fabian Schuiki Stefan Mach Luca Benini

The demand for floating-point compute power is ever growing. domains of big-data, machine learning, and scientific computing require a wide precision range high operational intensity. sheer number operations paired with increased density implied by technology scaling makes it more important than to achieve maximum energy-efficiency floating point operations. In this work, we present Kosmodrom, our novel silicon solution in Globalfoundries 22 nm Fully-Depleted Silicon on Insulator (FD-SOI)...

10.1109/icecs46596.2019.8964820 article EN 2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS) 2019-11-01

Fünfiiber-Drone: A Modular Open-Platform 18-grams Autonomous Nano-Drone

OPENALEX - Publications

Hanna Müller Daniele Palossi Stefan Mach Francesco Conti Luca Benini

Miniaturizing an autonomous robot is a challenging task - not only the mechanical but also electrical components have to operate within limited space, payload, and power. Furthermore, algorithms for navigation, such as state-of-the-art (SoA) visual navigation deep neural networks (DNNs), are becoming increasingly complex, striving more flexibility agility. In this work, we present sensor-rich, modular, nano-sized Unmanned Aerial Vehicle (UAV), almost small five Swiss Franc coin called...

10.23919/date51398.2021.9474262 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2021-02-01

Tiny-FPU: Low-Cost Floating-Point Support for Small RISC-V MCU Cores

OPENALEX - Publications

Luca Bertaccini Matteo Perotti Stefan Mach Pasquale Davide Schiavone Florian Zaruba and 1 more

In the Internet-Of-Things (IoT) domain, microcontrollers (MCUs) are used to collect and process data coming from sensors transmit them cloud. Applications that require range precision of floating-point (FP) arithmetic can be implemented using efficient hardware units (FPUs) or by software emulation. FPUs optimize performance code size, whilst emulation minimizes cost. We present a new area-optimized, IEEE 754-compliant RISC-V FPU (Tiny-FPU), we explore area, performance, power, energy...

10.1109/iscas51556.2021.9401149 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2021-04-27

A transprecision floating-point cluster for efficient near-sensor data analytics

OPENALEX - Publications

Fabio Montagna Stefan Mach Simone Benatti Angelo Garofalo Gianmarco Ottavi and 3 more

Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...

10.48550/arxiv.2008.12243 preprint EN other-oa arXiv (Cornell University) 2020-01-01

FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing

OPENALEX - Publications

Stefan Mach Fabian Schuiki Florian Zaruba Luca Benini

The slowdown of Moore's law and the power wall necessitates a shift towards finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high energy-proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU) supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures,...

10.48550/arxiv.2007.01530 preprint EN other-oa arXiv (Cornell University) 2020-01-01