Stefan Mach

ORCID: 0000-0002-3476-8857
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Numerical Methods and Algorithms
  • Low-power high-performance VLSI design
  • Parallel Computing and Optimization Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Memory and Neural Computing
  • Digital Filter Design and Implementation
  • Embedded Systems Design Techniques
  • Advanced Neural Network Applications
  • Analog and Mixed-Signal Circuit Design
  • CCD and CMOS Imaging Sensors
  • Semiconductor materials and devices
  • Advancements in PLL and VCO Technologies
  • EEG and Brain-Computer Interfaces
  • Advancements in Semiconductor Devices and Circuit Design
  • Inertial Sensor and Navigation
  • Quantum Computing Algorithms and Architecture
  • Big Data and Digital Economy
  • Green IT and Sustainability
  • Sensor Technology and Measurement Systems
  • Robotics and Sensor-Based Localization
  • Metal-Catalyzed Oxygenation Mechanisms
  • Bluetooth and Wireless Communication Technologies
  • Metal complexes synthesis and properties
  • Advanced Data Storage Technologies
  • Advanced Sensor and Energy Harvesting Materials

ETH Zurich
2017-2022

University of Bologna
2022

In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to energy consumption compute-intensive applications with large dynamic range. Experimental evidence shows that 50% consumed by core and its data memory is related FP computations. The adoption formats requiring lower number bits an interesting opportunity reduce consumption, since it allows simplify arithmetic circuitry bandwidth required transfer between registers enabling...

10.23919/date.2018.8342167 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

The slowdown of Moore's law and the power wall necessitates a shift toward finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU), supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures, extend...

10.1109/tvlsi.2020.3044752 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-12-30

The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from 1.7 $\mathrm{\mu}$W fully retentive cognitive sleep mode up 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB state-retentive SRAM,...

10.1109/jssc.2021.3114881 article EN IEEE Journal of Solid-State Circuits 2021-10-07

The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well high performance, energy efficiency, and extreme flexibility to deal complex fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 μW fully retentive COGNITIVE sleep mode up 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB state- SRAM, 4MB non-volatile MRAM. To...

10.1109/isscc42613.2021.9365939 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

Ultra-low power computing is a key enabler of deeply embedded platforms used in domains such as distributed sensing, internet things, wearable computing. The rising computational demands and high dynamic target algorithms often call for hardware support floating-point (FP) arithmetic system energy efficiency. In light transprecision computing, where accuracy data consciously changed during the execution applications, custom FP types are being to optimize wide range problems. We two - one 16...

10.1109/iscas.2018.8351816 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-01-01

RISC-V is an open-source instruction set architecture (ISA) with a modular design consisting of mandatory base part plus optional extensions. The 32IMFC ISA configuration has been widely adopted for the new-generation, low-power processors. Motivated by important energy savings that smaller-than-32-bit FP types have enabled in several application domains and related compute platforms, some recent studies published encouraging early results their adoption In this paper we introduce extensions...

10.23919/date.2019.8714897 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2019-03-01

To overcome the limitations of conventional floating-point number formats, an interval arithmetic and variable-width storage format called universal (unum) has been recently introduced [1]. This paper presents first (to best our knowledge) silicon implementation measurements application-specific integrated circuit (ASIC) for unum arithmetic. The designed chip includes a 128-bit wide unit to execute additions subtractions, while also supporting lossless (for intermediate results) lossy...

10.1109/iscas.2018.8351546 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-05-01

We present a fully programmable ultra-low-power embedded platform that hosts an "electronic skin" (E-skin) arrays of tactile sensors with up to 64 channels, ECG/EMG 8 inertial sensors, and Bluetooth Low Energy 5.0 module. The platform's compute engine is heterogeneous multi-core parallel ultra-low power (PULP) processor based on RISC-V, capable delivering 2.5 GOPS, within 55 mW consumption envelope, which makes the ideal for battery-powered always-on operation. Experimental results show peak...

10.1109/iwasi.2019.8791364 article EN 2019-06-01

Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...

10.1109/tpds.2021.3101764 article EN IEEE Transactions on Parallel and Distributed Systems 2021-08-04

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of NN models improving energy efficiency underlying hardware architectures. Narrow integer data types been vastly investigated for successfully pushed to extreme ternary binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision such as 8-bit FP mixed-precision techniques only...

10.1109/arith54963.2022.00010 article EN 2022-09-01

The Metis AI Processing Unit (AIPU) is a quad-core System-on-Chip (SoC) designed for edge inference, executing all components of an workload on-chip. AIPU exhibits performance 52.4 TOPS per core, and compound throughput 209.6 TOPS. Key features the its integration into PCIe card-based system are shown in Fig. 11.3.1. leverages benefits from quantized digital in-memory computing (D-IMC) architecture — with 8b weights, activations, full-precision accumulation to decrease both memory cost...

10.1109/isscc49657.2024.10454395 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

The crisis of Moore's law and new dominant Machine Learning workloads require a paradigm shift towards finely tunable-precision (a.k.a. transprecision) computing. More specifically, we need floating-point circuits that are capable to operate on many formats with high flexibility. We present the first silicon implementation 64-bit transprecision unit. It fully supports standard double, single, half precision, alongside custom bfloat 8 bit formats. Operations occur scalars or 2, 4, 8-way SIMD...

10.1109/vlsi-soc.2019.8920307 article EN 2019-10-01

The demand for floating-point compute power is ever growing. domains of big-data, machine learning, and scientific computing require a wide precision range high operational intensity. sheer number operations paired with increased density implied by technology scaling makes it more important than to achieve maximum energy-efficiency floating point operations. In this work, we present Kosmodrom, our novel silicon solution in Globalfoundries 22 nm Fully-Depleted Silicon on Insulator (FD-SOI)...

10.1109/icecs46596.2019.8964820 article EN 2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS) 2019-11-01

Miniaturizing an autonomous robot is a challenging task - not only the mechanical but also electrical components have to operate within limited space, payload, and power. Furthermore, algorithms for navigation, such as state-of-the-art (SoA) visual navigation deep neural networks (DNNs), are becoming increasingly complex, striving more flexibility agility. In this work, we present sensor-rich, modular, nano-sized Unmanned Aerial Vehicle (UAV), almost small five Swiss Franc coin called...

10.23919/date51398.2021.9474262 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2021-02-01

In the Internet-Of-Things (IoT) domain, microcontrollers (MCUs) are used to collect and process data coming from sensors transmit them cloud. Applications that require range precision of floating-point (FP) arithmetic can be implemented using efficient hardware units (FPUs) or by software emulation. FPUs optimize performance code size, whilst emulation minimizes cost. We present a new area-optimized, IEEE 754-compliant RISC-V FPU (Tiny-FPU), we explore area, performance, power, energy...

10.1109/iscas51556.2021.9401149 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2021-04-27

Recent applications in the domain of near-sensor computing require adoption floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose multi-core cluster that leverages fined-grained tunable principles transprecision provide support at minimum power budget. Our design - based on open-source RISC-V architecture combines parallelization and sub-word vectorization near-threshold operation, leading highly scalable versatile system. We...

10.48550/arxiv.2008.12243 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The slowdown of Moore's law and the power wall necessitates a shift towards finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable performing floating-point operations on wide range precisions with high energy-proportionality. We present FPnew, highly configurable open-source transprecision unit (TP-FPU) supporting standard custom FP formats. To demonstrate flexibility efficiency FPnew in general-purpose processor architectures,...

10.48550/arxiv.2007.01530 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...