Lukas Cavigelli

ORCID: 0000-0003-1767-7715
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • EEG and Brain-Computer Interfaces
  • Advanced Image and Video Retrieval Techniques
  • Neuroscience and Neural Engineering
  • CCD and CMOS Imaging Sensors
  • Brain Tumor Detection and Classification
  • Ferroelectric and Negative Capacitance Devices
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Parallel Computing and Optimization Techniques
  • Advanced Vision and Imaging
  • Neural Networks and Applications
  • Speech and Audio Processing
  • Image and Signal Denoising Methods
  • Music and Audio Processing
  • Advanced Image Processing Techniques
  • Blind Source Separation Techniques
  • Neural dynamics and brain function
  • Innovative Energy Harvesting Technologies
  • Image Enhancement Techniques
  • Speech Recognition and Synthesis
  • Energy Harvesting in Wireless Networks
  • Embedded Systems Design Techniques
  • Advanced Sensor and Energy Harvesting Materials

Huawei Technologies (France)
2022-2024

ETH Zurich
2013-2021

University of Bologna
2021

Board of the Swiss Federal Institutes of Technology
2018

We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on soft (continuous) relaxation of quantization and entropy, which we anneal their discrete counterparts throughout training. showcase this for two challenging applications: Image compression neural network compression. While these tasks have typically been approached different methods, our soft-to-hard gives results competitive the state-of-the-art both.

10.48550/arxiv.1704.00648 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over last few years, pushing image classification beyond human accuracy. The computational effort today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above power envelope imposed by mobile and deeply embedded applications face...

10.1109/tcad.2017.2682138 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2017-03-15

Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over last few years, pushing computer vision close beyond human accuracy. The required computational effort CNNs today requires power-hungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) accelerators for System-On-Chip (SoC) integration achieved very promising results. Unfortunately, even these highly optimized engines are still above...

10.1109/isvlsi.2016.111 article EN 2016-07-01

In recent years, deep learning (DL) has contributed significantly to the improvement of motor-imagery brain-machine interfaces (MI-BMIs) based on electroencephalography (EEG). While achieving high classification accuracy, DL models have also grown in size, requiring a vast amount memory and computational resources. This poses major challenge an embedded BMI solution that guarantees user privacy, reduced latency, low power consumption by processing data locally. this paper, we propose...

10.1109/smc42975.2020.9283028 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2020-10-11

An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition detection, semantic segmentation, action recognition, optical flow superresolution. Hardware acceleration these algorithms is essential to adopt improvements embedded mobile systems. We present a new architecture, design implementation as well the first reported silicon measurements such an...

10.1109/tcsvt.2016.2592330 article EN IEEE Transactions on Circuits and Systems for Video Technology 2016-07-18

Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number application scenarios with strong real-time and power constraints. Current trends CV clearly show rise neural network-based algorithms, which have recently broken many object detection localization records. These approaches very flexible can be used to tackle different challenges by only changing their parameters. In this paper, we present the first convolutional network...

10.1145/2742060.2743766 article EN 2015-05-19

The growing number of low-power smart devices in the Internet Things is coupled with concept "edge computing" that moving some intelligence, especially machine learning, toward edge network. Enabling learning algorithms to run on resource-constrained hardware, typically devices, challenging terms hardware (optimized and energy-efficient integrated circuits), algorithmic, firmware implementations. This article presents a FANN-on-MCU, an open-source toolkit built upon fast artificial neural...

10.1109/jiot.2020.2976702 article EN IEEE Internet of Things Journal 2020-02-27

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over web and recorded on data storage media. However, we pay for their high rate with visual artifacts degrading user experience. Deep convolutional neural networks have become a widespread tool address high-level computer vision tasks very successfully. Recently, they found way into areas low-level processing solve regression problems mostly relatively shallow networks. We present novel 12-layer...

10.1109/ijcnn.2017.7965927 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2017-05-01

Today there is a clear trend towards deploying advanced computer vision (CV) systems in growing number of application scenarios with strong real-time and power constraints. Brain-inspired algorithms capable achieving record-breaking results combined embedded are the best candidate for future CV video due to their flexibility high accuracy area image understanding. In this paper, we present an optimized convolutional network implementation suitable scene labeling on platforms. We show that...

10.1145/2744769.2744788 article EN 2015-06-02

Vector architectures are gaining traction for highly efficient processing of data-parallel workloads, driven by all major ISAs (RISC-V, Arm, Intel), and boosted landmark chips, like the Arm SVE-based Fujitsu A64FX, powering TOP500 leader Fugaku. The RISC-V V extension has recently reached 1.0-Frozen status. Here, we present its first open-source implementation, discuss new specification's impact on micro-architecture a lane-based design, provide insights performance-oriented design coupled...

10.1109/asap54787.2022.00017 preprint EN 2022-07-01

We propose Laelaps, an energy-efficient and fast learning algorithm with no false alarms for epileptic seizure detection from long-term intracranial electroencephalography (iEEG) signals.Laelaps uses end-to-end binary operations by exploiting symbolic dynamics brain-inspired hyperdimensional computing.Laelaps's results surpass those yielded state-of-the-art (SoA) methods [1],[2], [3], including deep learning, on a new very large dataset containing 116 seizures of 18 drug-resistant epilepsy...

10.23919/date.2019.8715186 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2019-03-01

Abstract The mitigation of rapid mass movements involves a subtle interplay between field surveys, numerical modelling, and experience. Hazard engineers rely on combination best practices and, if available, historical facts as vital prerequisite in establishing reproducible accurate hazard zoning. Full-scale tests have been performed to reinforce the physical understanding debris flows snow avalanches. Rockfall dynamics are - especially quantification energy dissipation during complex...

10.1038/s41467-021-25794-y article EN cc-by Nature Communications 2021-09-20

Accurate, fast, and reliable multiclass classification of electroencephalography (EEG) signals is a challenging task towards the development motor imagery brain-computer interface (MI-BCI) systems. We propose enhancements to different feature extractors, along with support vector machine (SVM) classifier, simultaneously improve accuracy execution time during training testing. focus on well-known common spatial pattern (CSP) Riemannian covariance methods, significantly extend these two...

10.23919/eusipco.2018.8553378 preprint EN 2021 29th European Signal Processing Conference (EUSIPCO) 2018-09-01

Personalized ubiquitous healthcare solutions require energy-efficient wearable platforms that provide an accurate classification of bio-signals while consuming low average power for long-term battery-operated use. Single lead electrocardiogram (ECG) signals the ability to detect, classify, and even predict cardiac arrhythmia. In this paper we propose a novel temporal convolutional network (TCN) achieves high accuracy still being feasible platform Experimental results on ECG5000 dataset show...

10.1109/aicas51828.2021.9458520 article EN 2021-06-06

Rockfalls have over the last decades become a serious and frequent hazard, especially due to larger variations in precipitation temperatures, destabilizing rocky slopes mountainous regions.Hence, civil engineers are applying latest simulation tools perform risk assessments plan mitigation strategies.These based on various models with many parameters that should be calibrated evaluated real-world in-field measurement data.In this work, we present rugged low-power multi-sensor node termed...

10.1109/tim.2017.2770799 article EN IEEE Transactions on Instrumentation and Measurement 2017-11-28

Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of in low-power embedded systems. Recent research shows sustain extreme quantization, binarizing their weights intermediate feature maps, thereby saving 8-32x memory collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary with computation tightly coupled to aggressive data reuse supporting even...

10.1109/coolchips.2018.8373076 article EN 2018-04-01

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose evaluate a novel algorithm change-based evaluation CNNs recorded with static camera setting, exploiting the spatio-temporal sparsity pixel changes....

10.1145/3131885.3131906 article EN 2017-09-05

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present CHIPMUNK, a small (<;1 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) hardware accelerator for Long-Short Term Memory UMC 65 nm technology capable...

10.1109/cicc.2018.8357068 article EN 2022 IEEE Custom Integrated Circuits Conference (CICC) 2018-04-01

Large language models (LLMs) are widely used across various applications, but their substantial computational requirements pose significant challenges, particularly in terms of HBM bandwidth bottlenecks and inter-device communication overhead. In this paper, we present PRESERVE, a novel prefetching framework designed to optimize LLM inference by overlapping memory reads for model weights KV-cache with collective operations. Through extensive experiments conducted on commercial AI...

10.48550/arxiv.2501.08192 preprint EN arXiv (Cornell University) 2025-01-14

The attention mechanism is essential for the impressive capabilities of transformer-based Large Language Models (LLMs). However, calculating computationally intensive due to its quadratic dependency on sequence length. We introduce a novel approach called Top-Theta Attention, or simply Top-$\theta$, which selectively prunes less elements by comparing them against carefully calibrated thresholds. This method greatly improves efficiency self-attention matrix multiplication while preserving...

10.48550/arxiv.2502.08363 preprint EN arXiv (Cornell University) 2025-02-12

10.1109/tcad.2025.3558140 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01

The last few years have brought advances in computer vision at an amazing pace, grounded on new findings deep neural network construction and training as well the availability of large labeled datasets. Applying these networks to images demands a high computational effort pushes use state-of-the-art real-time video data out reach embedded platforms. Many recent works focus reducing complexity for inference computing We adopt orthogonal viewpoint propose novel algorithm exploiting...

10.1109/tcsvt.2019.2903421 article EN IEEE Transactions on Circuits and Systems for Video Technology 2019-03-06

In the wake of success convolutional neural networks in image classification, object recognition, speech etc., demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power energy constraints at low cost, as well boosting throughput data centers, is growing rapidly. This has sparked a surge research into specialized hardware accelerators. Their performance typically limited by I/O bandwidth, consumption dominated transfers to off-chip memory, on-chip...

10.1109/jetcas.2019.2950093 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2019-10-29
Coming Soon ...