- Numerical Methods and Algorithms
- Low-power high-performance VLSI design
- Parallel Computing and Optimization Techniques
- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- Digital Filter Design and Implementation
- Embedded Systems Design Techniques
- Advanced Neural Network Applications
- Analog and Mixed-Signal Circuit Design
- CCD and CMOS Imaging Sensors
- Semiconductor materials and devices
- Advancements in PLL and VCO Technologies
- EEG and Brain-Computer Interfaces
- Advancements in Semiconductor Devices and Circuit Design
- Inertial Sensor and Navigation
- Quantum Computing Algorithms and Architecture
- Big Data and Digital Economy
- Green IT and Sustainability
- Sensor Technology and Measurement Systems
- Robotics and Sensor-Based Localization
- Metal-Catalyzed Oxygenation Mechanisms
- Bluetooth and Wireless Communication Technologies
- Metal complexes synthesis and properties
- Advanced Data Storage Technologies
- Advanced Sensor and Energy Harvesting Materials
ETH Zurich
2017-2022
University of Bologna
2022
In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to energy consumption compute-intensive applications with large dynamic range. Experimental evidence shows that 50% consumed by core and its data memory is related FP computations. The adoption formats requiring lower number bits an interesting opportunity reduce consumption, since it allows simplify arithmetic circuitry bandwidth required transfer between registers enabling...
The slowdown of Moore's law and the power wall necessitates a shift toward finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU), supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures, extend...
The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from 1.7 $\mathrm{\mu}$W fully retentive cognitive sleep mode up 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB state-retentive SRAM,...
The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 μW fully retentive COGNITIVE sleep mode up 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB state- SRAM, 4MB non-volatile MRAM. To...
Ultra-low power computing is a key enabler of deeply embedded platforms used in domains such as distributed sensing, internet things, wearable computing. The rising computational demands and high dynamic target algorithms often call for hardware support floating-point (FP) arithmetic system energy efficiency. In light transprecision computing, where accuracy data consciously changed during the execution applications, custom FP types are being to optimize wide range problems. We two - one 16...
RISC-V is an open-source instruction set architecture (ISA) with a modular design consisting of mandatory base part plus optional extensions. The 32IMFC ISA configuration has been widely adopted for the new-generation, low-power processors. Motivated by important energy savings that smaller-than-32-bit FP types have enabled in several application domains and related compute platforms, some recent studies published encouraging early results their adoption In this paper we introduce extensions...
To overcome the limitations of conventional floating-point number formats, an interval arithmetic and variable-width storage format called universal (unum) has been recently introduced [1]. This paper presents first (to best our knowledge) silicon implementation measurements application-specific integrated circuit (ASIC) for unum arithmetic. The designed chip includes a 128-bit wide unit to execute additions subtractions, while also supporting lossless (for intermediate results) lossy...
We present a fully programmable ultra-low-power embedded platform that hosts an "electronic skin" (E-skin) arrays of tactile sensors with up to 64 channels, ECG/EMG 8 inertial sensors, and Bluetooth Low Energy 5.0 module. The platform's compute engine is heterogeneous multi-core parallel ultra-low power (PULP) processor based on RISC-V, capable delivering 2.5 GOPS, within 55 mW consumption envelope, which makes the ideal for battery-powered always-on operation. Experimental results show peak...
Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...
Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of NN models improving energy efficiency underlying hardware architectures. Narrow integer data types been vastly investigated for successfully pushed to extreme ternary binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision such as 8-bit FP mixed-precision techniques only...
The Metis AI Processing Unit (AIPU) is a quad-core System-on-Chip (SoC) designed for edge inference, executing all components of an workload on-chip. AIPU exhibits performance 52.4 TOPS per core, and compound throughput 209.6 TOPS. Key features the its integration into PCIe card-based system are shown in Fig. 11.3.1. leverages benefits from quantized digital in-memory computing (D-IMC) architecture — with 8b weights, activations, full-precision accumulation to decrease both memory cost...
The crisis of Moore's law and new dominant Machine Learning workloads require a paradigm shift towards finely tunable-precision (a.k.a. transprecision) computing. More specifically, we need floating-point circuits that are capable to operate on many formats with high flexibility. We present the first silicon implementation 64-bit transprecision unit. It fully supports standard double, single, half precision, alongside custom bfloat 8 bit formats. Operations occur scalars or 2, 4, 8-way SIMD...
The demand for floating-point compute power is ever growing. domains of big-data, machine learning, and scientific computing require a wide precision range high operational intensity. sheer number operations paired with increased density implied by technology scaling makes it more important than to achieve maximum energy-efficiency floating point operations. In this work, we present Kosmodrom, our novel silicon solution in Globalfoundries 22 nm Fully-Depleted Silicon on Insulator (FD-SOI)...
Miniaturizing an autonomous robot is a challenging task - not only the mechanical but also electrical components have to operate within limited space, payload, and power. Furthermore, algorithms for navigation, such as state-of-the-art (SoA) visual navigation deep neural networks (DNNs), are becoming increasingly complex, striving more flexibility agility. In this work, we present sensor-rich, modular, nano-sized Unmanned Aerial Vehicle (UAV), almost small five Swiss Franc coin called...
In the Internet-Of-Things (IoT) domain, microcontrollers (MCUs) are used to collect and process data coming from sensors transmit them cloud. Applications that require range precision of floating-point (FP) arithmetic can be implemented using efficient hardware units (FPUs) or by software emulation. FPUs optimize performance code size, whilst emulation minimizes cost. We present a new area-optimized, IEEE 754-compliant RISC-V FPU (Tiny-FPU), we explore area, performance, power, energy...
Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...
The slowdown of Moore's law and the power wall necessitates a shift towards finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high energy-proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU) supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures,...