NFDI4DS | UHH-SEMS - Publication Details

Luka Macan

ORCID: 0009-0007-6130-8841

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5093777119

Research Areas

Parallel Computing and Optimization Techniques
Advanced Neural Network Applications
Topic Modeling
Distributed and Parallel Computing Systems
Robotics and Sensor-Based Localization
Graph Theory and Algorithms
Network Packet Processing and Optimization
UAV Applications and Optimization
Robotic Path Planning Algorithms
Advanced Memory and Neural Computing
CCD and CMOS Imaging Sensors

University of Bologna
2023-2024

Laboratori Guglielmo Marconi (Italy)
2023

Marconi University
2023

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

OPENALEX - Publications

Philip Wiese Gamze İslamoğlu Moritz Scherer Luka Macan Victor J. B. Jung and 3 more

10.1109/mdat.2025.3527371 article EN IEEE Design and Test 2025-01-01

Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers

OPENALEX - Publications

Moritz Scherer Luka Macan Victor J. B. Jung Philip Wiese Luca Bompani and 3 more

With the rise of embodied foundation models (EFMs), most notably small language (SLMs), adapting Transformers for edge applications has become a very active field research. However, achieving end-to-end deployment SLMs on microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this article, we demonstrate high efficiency SLM multicore RISC-V (RV32) MCU augmented with ML instruction extensions and hardware neural processing unit...

10.1109/tcad.2024.3443718 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2024-11-01

Distilling Tiny and Ultrafast Deep Neural Networks for Autonomous Navigation on Nano-UAVs

OPENALEX - Publications

Lorenzo Lamberti Lorenzo Bellone Luka Macan Enrico Natalizio Francesco Conti and 2 more

Nano-sized unmanned aerial vehicles (UAVs) are ideal candidates for flying Internet-of-Things smart sensors to collect information in narrow spaces. This requires ultra-fast navigation under very tight memory/computation constraints. The PULP-Dronet convolutional neural network (CNN) enables autonomous running aboard a nano-UAV at 19, the cost of large memory footprint 320kB– and with drone control complex scenarios hindered by disjoint training collision avoidance steering capabilities. In...

10.1109/jiot.2024.3431913 article EN cc-by IEEE Internet of Things Journal 2024-07-22

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

OPENALEX - Publications

Philip Wiese Gamze İslamoğlu Moritz Scherer Luka Macan Victor J. B. Jung and 3 more

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with evolution models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors hardwired accelerators supported an automated deployment flow. demonstrate Attention-based model in tinyML power envelope octa-core cluster coupled accelerator quantized Attention. Our flow enables end-to-end 8-bit MobileBERT, achieving leading-edge...

10.48550/arxiv.2408.02473 preprint EN arXiv (Cornell University) 2024-08-05

Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

OPENALEX - Publications

Moritz Scherer Luka Macan Victor J. B. Jung Philip Wiese Luca Bompani and 3 more

With the rise of Embodied Foundation Models (EFMs), most notably Small Language (SLMs), adapting Transformers for edge applications has become a very active field research. However, achieving end-to-end deployment SLMs on microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this paper, we demonstrate high-efficiency SLM multicore RISC-V (RV32) MCU augmented with ML instruction extensions and hardware neural processing unit (NPU)....

10.48550/arxiv.2408.04413 preprint EN arXiv (Cornell University) 2024-08-08

Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs

OPENALEX - Publications

Lorenzo Lamberti Lorenzo Bellone Luka Macan Enrico Natalizio Francesco Conti and 2 more

Nano-sized unmanned aerial vehicles (UAVs) are ideal candidates for flying Internet-of-Things smart sensors to collect information in narrow spaces. This requires ultra-fast navigation under very tight memory/computation constraints. The PULP-Dronet convolutional neural network (CNN) enables autonomous running aboard a nano-UAV at 19 frame/s, the cost of large memory footprint 320 kB -- and with drone control complex scenarios hindered by disjoint training collision avoidance steering...

10.48550/arxiv.2407.12675 preprint EN arXiv (Cornell University) 2024-07-17

WIP: Automatic DNN Deployment on Heterogeneous Platforms: the GAP9 Case Study

OPENALEX - Publications

Luka Macan Alessio Burrello Luca Benini Francesco Conti

Emerging Artificial-Intelligence-enabled System-on-Chips (AI-SoCs) combine a flexible microcontroller with parallel Digital Signal Processors (DSP) and heterogeneous acceleration capabilities. In this Work-in-Progress paper, we focus on the GAP9 RISC-V SoC as case study to show how open-source DORY Deep Neural Network (DNN) tool flow can be extended for by fine grained interleaving of dedicated Engine cluster cores. Our results that up 91% peak accelerator throughput extracted in end-to-end...

10.1145/3607889.3609092 article EN 2023-09-17

Coming Soon ...