NFDI4DS | UHH-SEMS - Publication Details

Maurizio Omologo

ORCID: 0000-0003-0879-0548

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5025736898

Research Areas

Speech and Audio Processing
Speech Recognition and Synthesis
Music and Audio Processing
Advanced Adaptive Filtering Techniques
Indoor and Outdoor Localization Technologies
Blind Source Separation Techniques
Speech and dialogue systems
Natural Language Processing Techniques
Phonetics and Phonology Research
Music Technology and Sound Studies
Advanced Data Compression Techniques
Topic Modeling
Video Surveillance and Tracking Methods
Underwater Acoustics Research
Spectroscopy and Chemometric Analyses
Hearing Loss and Rehabilitation
Acoustic Wave Phenomena Research
Hydraulic and Pneumatic Systems
Neural Networks and Applications
Target Tracking and Data Fusion in Sensor Networks
Optical Systems and Laser Technology
Linguistic Variation and Morphology
Image Processing Techniques and Applications
CCD and CMOS Imaging Sensors
Robotics and Automated Systems

Amazon (United States)
2021-2022

Amazon (Germany)
2021-2022

Fondazione Bruno Kessler
2012-2021

University of Trento
2009-2012

Shine Micro (United States)
2010

Construction Technologies Institute
2006-2007

Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare
1993-2006

Telecom Italia Lab
2003

Light Gated Recurrent Units for Speech Recognition

OPENALEX - Publications

Mirco Ravanelli Philémon Brakel Maurizio Omologo Yoshua Bengio

A field that has directly benefited from the recent advances in deep learning is automatic speech recognition (ASR). Despite great achievements of past decades, however, a natural and robust human-machine interaction still appears to be out reach, especially challenging environments characterized by significant noise reverberation. To improve robustness, modern recognizers often employ acoustic models based on recurrent neural networks (RNNs) are naturally able exploit large time contexts...

10.1109/tetci.2017.2762739 article EN IEEE Transactions on Emerging Topics in Computational Intelligence 2018-03-23

Acoustic event localization using a crosspower-spectrum phase based technique

OPENALEX - Publications

Maurizio Omologo Piergiorgio Svaizer

Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow estimation, namely normalized cross correlation, LMS adaptive filters, crosspower-spectrum phase: they combined with bidimensional representation, the coherence measure, order to emphasize information exploited estimating position of both non-moving and moving sources. To compare given techniques, different sources were...

10.1109/icassp.1994.389667 article EN 2002-12-17

Use of the crosspower-spectrum phase in acoustic event location

OPENALEX - Publications

Maurizio Omologo Piergiorgio Svaizer

The article reports on the use of crosspower-spectrum phase (CSP) analysis as an accurate time delay estimation (TDE) technique. It is used in a microphone array system for location acoustic events noisy and reverberant environments. A corresponding coherence measure (CM) its graphical representation are introduced to show TDE accuracy. Using two-microphone pair array, real experiments less than 10 cm average error 6 m/spl times/6 m area.

10.1109/89.568735 article EN IEEE Transactions on Speech and Audio Processing 1997-05-01

Automatic segmentation and labeling of speech based on Hidden Markov Models

OPENALEX - Publications

Fabio Brugnara Daniele Falavigna Maurizio Omologo

10.1016/0167-6393(93)90083-w article FR Speech Communication 1993-08-01

Acoustic source location in noisy and reverberant environment using CSP analysis

OPENALEX - Publications

Maurizio Omologo Piergiorgio Svaizer

A linear four microphone array can be employed for acoustic event location in a real environment using an accurate time delay estimation. This paper refers to the use of specific technique, based on crosspower spectrum phase (CSP) analysis, that yielded performance. The behavior this technique is investigated under different noise and reverberation conditions. Real experiments as well simulations were conducted analyze wide variety situations. Results show system robustness at quite critical...

10.1109/icassp.1996.543272 article EN 2002-12-24

Convolutive BSS of Short Mixtures by ICA Recursively Regularized Across Frequencies

OPENALEX - Publications

Francesco Nesta Piergiorgio Svaizer Maurizio Omologo

This paper proposes a new method of frequency-domain blind source separation (FD-BSS), able to separate acoustic sources in challenging conditions. In BSS, the time-domain signals are transformed into time-frequency series and is generally performed by applying independent component analysis (ICA) at each frequency envelope. When short observed long demixing filters required, number time observations for limited variance ICA estimator increases due intrinsic statistical bias. Furthermore,...

10.1109/tasl.2010.2053027 article EN IEEE Transactions on Audio Speech and Language Processing 2010-06-23

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

OPENALEX - Publications

Mirco Ravanelli Luca Cristoforetti Roberto Gretter Marco Pellin Alessandro Sosi and 1 more

This paper introduces the contents and possible usage of DIRHA-ENGLISH multi-microphone corpus, recently realized under EC DIRHA project. The reference scenario is a domestic environment equipped with large number microphones microphone arrays distributed in space. corpus composed both real simulated material, it includes 12 US UK English native speakers. Each speaker uttered different sets phonetically-rich sentences, newspaper articles, conversational speech, keywords, commands. From this...

10.1109/asru.2015.7404805 article EN 2015-12-01

Improving Speech Recognition by Revising Gated Recurrent Units

OPENALEX - Publications

Mirco Ravanelli Philémon Brakel Maurizio Omologo Yoshua Bengio

Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs).The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability learn long-term dependencies and robustness vanishing gradients.Nevertheless, LSTMs have a rather complex design with three multiplicative gates, might impair efficient implementation.An...

10.21437/interspeech.2017-775 article EN Interspeech 2022 2017-08-16

Multi-Speaker Tracking From an Audio–Visual Sensing Device

OPENALEX - Publications

Xinyuan Qian Alessio Brutti Oswald Lanz Maurizio Omologo Andrea Cavallaro

Compact multi-sensor platforms are portable and thus desirable for robotics personal-assistance tasks. However, compared to physically distributed sensors, the size of these makes person tracking more difficult. To address this challenge, we propose a novel 3-D audio-visual people tracker that exploits visual observations (object detections) guide acoustic processing by constraining likelihood on horizontal plane defined predicted height speaker. This solution allows estimate, with small...

10.1109/tmm.2019.2902489 article EN IEEE Transactions on Multimedia 2019-03-01

Context-Aware Transformer Transducer for Speech Recognition

OPENALEX - Publications

Feng-Ju Chang Jing Liu Martin Radfar Athanasios Mouchtaris Maurizio Omologo and 2 more

End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty recognizing uncommon words, that appear infrequently in the training data. One promising method, to improve accuracy on such rare is latch onto personalized/contextual information at inference. In this work, we present a novel context-aware transformer transducer (CATT) network improves state-of-the-art transformer-based ASR system by taking advantage of contextual signals. Specifically, propose multi-head...

10.1109/asru51503.2021.9687895 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

Acoustic source location in a three-dimensional space using crosspower spectrum phase

OPENALEX - Publications

Piergiorgio Svaizer Marco Matassoni Maurizio Omologo

A microphone array can be used to locate a dominant acoustic source in given environment. This capability is successfully employed an active talker teleconferencing or other multi-speaker applications. In this work the location obtained two steps: (1) time difference of arrival (TDOA) computation between signals array; (2) "optimal" based on interchannel delay estimates and geometrical description sensor arrangement. The crosspower spectrum phase technique was for TDOA estimation, while...

10.1109/icassp.1997.599611 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2002-11-22

Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources

OPENALEX - Publications

Francesco Nesta Maurizio Omologo

According to the physical meaning of frequency-domain blind source separation (FD-BSS), each mixing matrix estimated by independent component analysis (ICA) contains information on acoustic propagation related and then can be used for localization purposes. In this paper, we analyze Generalized State Coherence Transform (GSCT) which is a non-linear transform space represented whole demixing matrices. The enables an accurate estimation time-delay multiple sources in dimensions. Furthermore,...

10.1109/tasl.2011.2160168 article EN IEEE Transactions on Audio Speech and Language Processing 2011-06-21

Realistic Multi-Microphone Data Simulation for Distant Speech Recognition

OPENALEX - Publications

Mirco Ravanelli Piergiorgio Svaizer Maurizio Omologo

The availability of realistic simulated corpora is key importance for the future progress distant speech recognition technology.The reliability, flexibility and low computational cost a data simulation process may ultimately allow researchers to train, tune test different techniques in variety acoustic scenarios, avoiding laborious effort directly recording real from targeted environment.In last decade, several have been released research community, including data-sets distributed context...

10.21437/interspeech.2016-731 article EN Interspeech 2022 2016-08-29

Environmental conditions and acoustic transduction in hands-free speech recognition

OPENALEX - Publications

Maurizio Omologo Piergiorgio Svaizer Marco Matassoni

10.1016/s0167-6393(98)00030-2 article EN Speech Communication 1998-08-01

Speaker independent continuous speech recognition using an acoustic-phonetic Italian corpus

OPENALEX - Publications

B. Angelini Fabio Brugnara Daniele Falavigna Diego Giuliani Roberto Gretter and 1 more

10.21437/icslp.1994-362 article EN 1994-09-18

Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

10.21437/interspeech.2005-745 article EN Interspeech 2022 2005-09-04

Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

Comparing the different sound source localization techniques, proposed in literature during last decade, represents a relevant topic order to establish advantages and disadvantages of given approach real-time implementation. Traditionally, algorithms for rely on an estimation time difference arrival (TDOA) at microphone pairs through GCC-PHAT When several are available position can be estimated as point space that best fits set TDOA measurements by applying global coherence field (GCF), also...

10.1109/hscma.2008.4538690 article EN 2008-05-01

Multiple Source Localization Based on Acoustic Map De-Emphasis

OPENALEX - Publications

Alessio Brutti Maurizio Omologo Piergiorgio Svaizer

10.1155/2010/147495 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2010-01-01

Batch-normalized joint training for DNN-based distant speech recognition

OPENALEX - Publications

Mirco Ravanelli Philémon Brakel Maurizio Omologo Yoshua Bengio

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in last years on both enhancement and recognition, one potential limitation state-of-the-art technology lies composing modules that not well matched because they trained jointly.

10.1109/slt.2016.7846241 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

Audio-Visual Tracking of Concurrent Speakers

OPENALEX - Publications

Xinyuan Qian Alessio Brutti Oswald Lanz Maurizio Omologo Andrea Cavallaro

Audio-visual tracking of an unknown number concurrent speakers in 3D is a challenging task, especially when sound and video are collected with compact sensing platform. In this paper, we propose tracker that builds on generative discriminative audio-visual likelihood models formulated particle filtering framework. We localize multiple de-emphasized acoustic map assisted by the image detection-derived observations. The multi-modal observations either assigned to existing tracks for...

10.1109/tmm.2021.3061800 article EN IEEE Transactions on Multimedia 2021-02-24

Coming Soon ...