NFDI4DS | UHH-SEMS - Publication Details

Alessio Brutti

ORCID: 0000-0003-4146-3071

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5066363315

Research Areas

Speech and Audio Processing
Music and Audio Processing
Speech Recognition and Synthesis
Indoor and Outdoor Localization Technologies
Advanced Adaptive Filtering Techniques
Video Surveillance and Tracking Methods
Speech and dialogue systems
Emotion and Mood Recognition
Music Technology and Sound Studies
Multimodal Machine Learning Applications
Hearing Loss and Rehabilitation
Natural Language Processing Techniques
Phonetics and Phonology Research
Human Pose and Action Recognition
Topic Modeling
Blind Source Separation Techniques
Underwater Acoustics Research
Gait Recognition and Analysis
Animal Vocal Communication and Behavior
Anomaly Detection Techniques and Applications
Domain Adaptation and Few-Shot Learning
Target Tracking and Data Fusion in Sensor Networks
Sentiment Analysis and Opinion Mining
Human Mobility and Location-Based Analysis
Gaze Tracking and Assistive Technology

Fondazione Bruno Kessler
2016-2025

Free University of Bozen-Bolzano
2022

Queen Mary University of London
2016

Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare
2005

Large Language Models are Strong Audio-Visual Speech Recognition Learners

OPENALEX - Publications

Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis and 3 more

10.1109/icassp49660.2025.10889251 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Multi-Speaker Tracking From an Audio–Visual Sensing Device

OPENALEX - Publications

Xinyuan Qian Alessio Brutti Oswald Lanz Maurizio Omologo Andrea Cavallaro

Compact multi-sensor platforms are portable and thus desirable for robotics personal-assistance tasks. However, compared to physically distributed sensors, the size of these makes person tracking more difficult. To address this challenge, we propose a novel 3-D audio-visual people tracker that exploits visual observations (object detections) guide acoustic processing by constraining likelihood on horizontal plane defined predicted height speaker. This solution allows estimate, with small...

10.1109/tmm.2019.2902489 article EN IEEE Transactions on Multimedia 2019-03-01

Compact Recurrent Neural Networks for Acoustic Event Detection on Low-Energy Low-Complexity Platforms

OPENALEX - Publications

Gianmarco Cerutti Rahul Prasad Alessio Brutti Elisabetta Farella

Outdoor acoustic events detection is an exciting research field but challenged by the need for complex algorithms and deep learning techniques, typically requiring many computational, memory, energy resources. This challenge discourages IoT implementation, where efficient use of resources required. However, current embedded technologies microcontrollers have increased their capabilities without penalizing efficiency. paper addresses application sound event at edge, optimizing techniques on...

10.1109/jstsp.2020.2969775 article EN IEEE Journal of Selected Topics in Signal Processing 2020-01-27

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

OPENALEX - Publications

Vandana Rajan Alessio Brutti Andrea Cavallaro

Humans express their emotions via facial expressions, voice intonation and word choices. To infer the nature of underlying emotion, recognition models may use a single modality, such as vision, audio, text, or combination modalities. Generally, that fuse complementary information from multiple modalities outperform uni-modal counterparts. However, successful model fuses requires components can effectively aggregate task-relevant each modality. As cross-modal attention is seen an effective...

10.1109/icassp43922.2022.9746924 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

10.21437/interspeech.2005-745 article EN Interspeech 2022 2005-09-04

Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

Comparing the different sound source localization techniques, proposed in literature during last decade, represents a relevant topic order to establish advantages and disadvantages of given approach real-time implementation. Traditionally, algorithms for rely on an estimation time difference arrival (TDOA) at microphone pairs through GCC-PHAT When several are available position can be estimated as point space that best fits set TDOA measurements by applying global coherence field (GCF), also...

10.1109/hscma.2008.4538690 article EN 2008-05-01

Multiple Source Localization Based on Acoustic Map De-Emphasis

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

10.1155/2010/147495 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2010-01-01

EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR

OPENALEX - Publications

Mohamed Nabih Ali Daniele Falavigna Alessio Brutti

10.1109/icassp49660.2025.10890639 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

3D audio-visual speaker tracking with an adaptive particle filter

OPENALEX - Publications

Xinyuan Qian Alessio Brutti Maurizio Omologo Andrea Cavallaro

We propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of camera and small microphone array. After extracting cues individual modalities we fuse them adaptively using their reliability in particle filter framework. The the audio signal is measured based on maximum Global Coherence Field (GCF) peak value at each frame. visual colour-histogram matching with detection results compared reference image RGB space. Experiments...

10.1109/icassp.2017.7952686 article EN 2017-03-01

Audio-Visual Tracking of Concurrent Speakers

OPENALEX - Publications

Xinyuan Qian Alessio Brutti Oswald Lanz Maurizio Omologo Andrea Cavallaro

Audio-visual tracking of an unknown number concurrent speakers in 3D is a challenging task, especially when sound and video are collected with compact sensing platform. In this paper, we propose tracker that builds on generative discriminative audio-visual likelihood models formulated particle filtering framework. We localize multiple de-emphasized acoustic map assisted by the image detection-derived observations. The multi-modal observations either assigned to existing tracks for...

10.1109/tmm.2021.3061800 article EN IEEE Transactions on Multimedia 2021-02-24

Localization of multiple speakers based on a two step acoustic map analysis

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

An interface for distant-talking control of home devices requires the possibility identifying positions multiple users. Acoustic maps, based either on global coherence field (GCF) or oriented (OGCF), have already been exploited successfully to determine position and head orientation a single speaker. This paper proposes new method using acoustic maps deal with case two simultaneous speakers. The is step analysis map: first dominant speaker localized; then map modified by compensating effects...

10.1109/icassp.2008.4518618 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

Acoustic Based Surveillance System for Intrusion Detection

OPENALEX - Publications

Christian Zieger Alessio Brutti Piergiorgio Svaizer

This paper describes a surveillance system for intrusion detection which is based only on information derived from the processing of audio signals acquired by distributed microphone network (DMN). In particular exploits different acoustic features and estimates event positions in order to detect reject possible false alarms that may be generated sound sources inside outside monitored room. An evaluation has been conducted measure performance terms missed presence events produced test...

10.1109/avss.2009.49 article EN 2009-09-01

Neural Network Distillation on IoT Platforms for Sound Event Detection

OPENALEX - Publications

Gianmarco Cerutti Rahul Prasad Alessio Brutti Elisabetta Farella

10.21437/interspeech.2019-2394 article EN Interspeech 2022 2019-09-13

Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data

OPENALEX - Publications

Enrico Fini Alessio Brutti

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve learning efficiency and overall performance. particular, introduce novel loss function, called Sample Mean Loss present better modelling turn behaviour, by devising an analytical expression compute probability new joining...

10.1109/icassp40776.2020.9053477 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Low-complexity acoustic scene classification in DCASE 2022 Challenge

OPENALEX - Publications

Irene Martín-Morató Francesco Paissan Alberto Ancilotto Toni Heittola Annamaria Mesaros and 3 more

This paper presents an analysis of the Low-Complexity Acoustic Scene Classification task in DCASE 2022 Challenge. The was a continuation from previous years, but low-complexity requirements were changed to following: maximum number allowed parameters, including zero-valued ones, 128 K, with parameters being represented using INT8 numerical format; and multiply-accumulate operations at inference time 30 million. provided baseline system is convolutional neural network which employs...

10.48550/arxiv.2206.03835 preprint EN other-oa arXiv (Cornell University) 2022-01-01

WOZ acoustic data collection for interactive TV

OPENALEX - Publications

Alessio Brutti Luca Cristoforetti Walter Kellermann Lydia Marquardt Maurizio Omologo

10.1007/s10579-010-9116-x article EN Language Resources and Evaluation 2010-02-10

End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations

OPENALEX - Publications

Giovanni Morrone Samuele Cornell Luca Serafini Enrico Zovato Alessio Brutti and 1 more

10.1016/j.specom.2024.103081 article EN Speech Communication 2024-05-11

Classification of Acoustic Maps to Determine Speaker Position and Orientation from a Distributed Microphone Network

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer Christian Zieger

Acoustic maps created on the basis of signals acquired by distributed networks microphones allow to identify position and orientation an active talker in enclosure. In adverse situations high background noise, reverberation or unavailability direct paths microphones, localization may fail. This paper proposes a novel approach estimation head based classification global coherence field (GCF) oriented GCF maps. Preliminary experiments with data obtained simulated propagation as well real room...

10.1109/icassp.2007.366957 article EN 2007-04-01

Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs

OPENALEX - Publications

Alessio Brutti Francesco Nesta

10.1016/j.csl.2012.08.002 article EN Computer Speech & Language 2012-08-23

Multi-room speech activity detection using a distributed microphone network in domestic environments

OPENALEX - Publications

Panagiotis Giannoulis Alessio Brutti Marco Matassoni Alberto Abad Athanasios Katsamanis and 3 more

Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well the propagation of acoustic events across adjacent rooms, critically degrade performance standard processing algorithms. In this application scenario, a crucial task is detection localization generated by users within various rooms. A specific challenge multi-room inter-room interference that negatively affects activity detectors. paper, we...

10.1109/eusipco.2015.7362588 article EN 2015-08-01

MARVEL: Multimodal Extreme Scale Data Analytics for Smart Cities Environments

OPENALEX - Publications

Dragana Bajović Arian Bakhtiarnia George Bravos Alessio Brutti Felix Burkhardt and 42 more

A Smart City based on data acquisition, handling and intelligent analysis requires efficient design implementation of the respective AI technologies underlying infrastructure for seamlessly analyzing large amounts in real-time. The EU project MARVEL will research solutions that can improve integration multiple sources a environment harnessing advantages rooted multimodal perception surrounding environment.

10.1109/balkancom53780.2021.9593258 article EN 2021-09-20

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

OPENALEX - Publications

Giovanni Morrone Samuele Cornell Desh Raj Luca Serafini Enrico Zovato and 2 more

In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs by separating speakers signals and then applying voice activity detection each estimated speaker signal. particular, compare two low-latency models. Moreover, show a post-processing algorithm that significantly reduces false alarm errors pipeline. We perform our experiments datasets: Fisher Corpus Part 1 CALLHOME, evaluating both metrics. Notably,...

10.1109/slt54892.2023.10023280 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

OPENALEX - Publications

Luca Serafini Samuele Cornell Giovanni Morrone Enrico Zovato Alessio Brutti and 1 more

We performed an experimental review of current diarization systems for the conversational telephone speech (CTS) domain. In detail, we considered a total eight different algorithms belonging to clustering-based, end-to-end neural (EEND), and separation guided (SSGD) paradigms. studied inference-time computational requirements accuracy on four CTS datasets with characteristics languages. found that, among all methods considered, EEND-vector clustering (EEND-VC) offers best trade-off in terms...

10.1016/j.csl.2023.101534 article EN cc-by Computer Speech & Language 2023-05-30

A speech event detection and localization task for multiroom environments

OPENALEX - Publications

Alessio Brutti Mirco Ravanelli Piergiorgio Svaizer Maurizio Omologo

Domestic environments are particularly challenging for distant speech recognition and audio processing in general. Reverberation, background noise interfering sources, as well the propagation of acoustic events across adjacent rooms, critically degrade performance standard algorithms. The DIRHA EU project addresses development distant-speech interaction with devices services within multiple rooms typical apartments. A corpus multichannel data has been created to represent realistic scenes,...

10.1109/hscma.2014.6843271 article EN 2014-05-01

Coming Soon ...