Gianna Paulin

ORCID: 0000-0002-1310-0911
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Memory and Neural Computing
  • Advanced Neural Network Applications
  • Ferroelectric and Negative Capacitance Devices
  • CCD and CMOS Imaging Sensors
  • Neural Networks and Applications
  • Advancements in Semiconductor Devices and Circuit Design
  • Embedded Systems Design Techniques
  • Analog and Mixed-Signal Circuit Design
  • Speech Recognition and Synthesis
  • Interconnection Networks and Systems
  • Speech and Audio Processing
  • Low-power high-performance VLSI design
  • Advanced Data Storage Technologies
  • Semiconductor materials and devices
  • Advanced Wireless Communication Techniques
  • Advanced Data Compression Techniques
  • Electronic Packaging and Soldering Technologies
  • Blind Source Separation Techniques
  • PAPR reduction in OFDM
  • 3D IC and TSV technologies
  • Energy Harvesting in Wireless Networks
  • VLSI and Analog Circuit Testing

ETH Zurich
2018-2024

University of Bologna
2023

Innovation Cluster (Canada)
2023

National University of Singapore
2023

Emerging Artificial Intelligence-enabled Internet-of-Things (Al-loT) SoCs [1–4] for augmented reality, personalized healthcare and nano-robotics need to run a large variety of tasks within power envelope few tens mW: compute-intensive but bit-precision-tolerant Deep Neural Networks (DNNs), as well signal processing control requiring high-precision floating-point. Performance energy constraints vary greatly between different applications even stages the same application. We present Marsellus...

10.1109/isscc42615.2023.10067643 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2023-02-19

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present CHIPMUNK, a small (<;1 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) hardware accelerator for Long-Short Term Memory UMC 65 nm technology capable...

10.1109/cicc.2018.8357068 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2018-04-01

Emerging artificial intelligence-enabled Internet-of-Things (AI-IoT) system-on-chip (SoC) for augmented reality, personalized healthcare, and nanorobotics need to run many diverse tasks within a power envelope of few tens mW over wide range operating conditions: compute-intensive but strongly quantized deep neural network (DNN) inference, as well signal processing control requiring high-precision floating point. We present MARSELLUS, an all-digital heterogeneous SoC AI-IoT end-nodes...

10.1109/jssc.2023.3318301 article EN IEEE Journal of Solid-State Circuits 2023-10-03

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of NN models improving energy efficiency underlying hardware architectures. Narrow integer data types been vastly investigated for successfully pushed to extreme ternary binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision such as 8-bit FP mixed-precision techniques only...

10.1109/arith54963.2022.00010 article EN 2022-09-01

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such computer vision audio processing. However, efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture transformers related that targets inference on embedded...

10.1109/islped58423.2023.10244348 article EN 2023-08-07

Radio resource management (RRM) is critical in 5G mobile communications due to its ubiquity on every radio device and low latency constraints. The rapidly evolving RRM algorithms with requirements combined the dense massive base station deployment ask for an on-the-edge acceleration system a tradeoff between flexibility, efficiency, cost-making application-specific instruction-set processors (ASIPs) optimal choice. In this work, we start from baseline, simple RISC-V core introduce...

10.1109/tvlsi.2021.3093242 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2021-07-12

Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by keeping an internal state, making them ideal for time-series problems speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators RNNs. We present Muntaniala, RNN accelerator architecture LSTM inference with a silicon-measured energy-efficiency of 3.25$TOP/s/W$ performance 30.53$GOP/s$ UMC 65 $nm$ technology....

10.1109/tcsi.2021.3099716 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-07-30

We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of cores with custom extensions, two 64-bit host cores, latency-tolerant multi-chiplet interconnect memory 32 GiB HBM2E. It achieves leading-edge utilization stencils (83 %), sparse-dense (42 sparse-sparse (49 %) matrix multiply.

10.48550/arxiv.2406.15068 preprint EN arXiv (Cornell University) 2024-06-21

As contribution to projects like European Processor Initiative (EPI) as well Stencil- and Tensor Accelerator (STX), Fraunhofer IZM has further developed its advanced packaging portfolio with special focus on wafer level of high performance computing (HPC) modules. This includes the scaling well-established multi-layer copper redistribution technology enable a 4 μm line / space routing (8 pitch) over multiple layers 6 thick polymer interlayer dielectric micro vias 8 diameter. The (RDL)...

10.1109/ectc51529.2024.00340 article EN 2024-05-28

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory UMC 65 nm technology capable operate at measured peak efficiency up 3.08 Gop/s/mW 1.24 mW power. To implement big RNN models without...

10.48550/arxiv.1711.05734 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Modern high-performance computing architectures (Multicore, GPU, Manycore) are based on tightly-coupled clusters of processing elements, physically implemented as rectangular tiles. Their size and aspect ratio strongly impact the achievable operating frequency energy efficiency, but they should be flexible possible to achieve a high utilization for top-level die floorplan. In this paper, we explore flexibility range cluster RISC-V cores with shared L1 memory used build scalable accelerators,...

10.1109/isvlsi54635.2022.00021 article EN 2022-07-01

Emerging Artificial Intelligence-enabled Internet-of-Things (AI-IoT) System-on-a-Chip (SoC) for augmented reality, personalized healthcare, and nano-robotics need to run many diverse tasks within a power envelope of few tens mW over wide range operating conditions: compute-intensive but strongly quantized Deep Neural Network (DNN) inference, as well signal processing control requiring high-precision floating-point. We present Marsellus, an all-digital heterogeneous SoC AI-IoT end-nodes...

10.48550/arxiv.2305.08415 preprint EN other-oa arXiv (Cornell University) 2023-01-01

With the rise of deep learning (DL), our world braces for artificial intelligence (AI) in every edge device, creating an urgent need edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at ultra-low power (ULP), with a very short time market. its strong legacy solutions open platforms, EU is well-positioned become leader this However, requires least 100 times more energy-efficient, while offering sufficient flexibility scalability deal as...

10.23919/date56975.2023.10136926 article EN Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), 2015 2023-04-01

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such computer vision audio processing. However, efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture transformers related that targets inference on embedded...

10.48550/arxiv.2307.03493 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of NN models improving energy efficiency underlying hardware architectures. Narrow integer data types been vastly investigated for successfully pushed to extreme ternary binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision such as 8-bit FP mixed-precision techniques only...

10.48550/arxiv.2207.03192 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Modern high-performance computing architectures (Multicore, GPU, Manycore) are based on tightly-coupled clusters of processing elements, physically implemented as rectangular tiles. Their size and aspect ratio strongly impact the achievable operating frequency energy efficiency, but they should be flexible possible to achieve a high utilization for top-level die floorplan. In this paper, we explore flexibility range cluster RISC-V cores with shared L1 memory used build scalable accelerators,...

10.48550/arxiv.2209.00889 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...