Maurizio Omologo

ORCID: 0000-0003-0879-0548
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Advanced Adaptive Filtering Techniques
  • Indoor and Outdoor Localization Technologies
  • Blind Source Separation Techniques
  • Speech and dialogue systems
  • Natural Language Processing Techniques
  • Phonetics and Phonology Research
  • Music Technology and Sound Studies
  • Advanced Data Compression Techniques
  • Topic Modeling
  • Video Surveillance and Tracking Methods
  • Underwater Acoustics Research
  • Spectroscopy and Chemometric Analyses
  • Hearing Loss and Rehabilitation
  • Acoustic Wave Phenomena Research
  • Hydraulic and Pneumatic Systems
  • Neural Networks and Applications
  • Target Tracking and Data Fusion in Sensor Networks
  • Optical Systems and Laser Technology
  • Linguistic Variation and Morphology
  • Image Processing Techniques and Applications
  • CCD and CMOS Imaging Sensors
  • Robotics and Automated Systems

Amazon (United States)
2021-2022

Amazon (Germany)
2021-2022

Fondazione Bruno Kessler
2012-2021

University of Trento
2009-2012

Shine Micro (United States)
2010

Construction Technologies Institute
2006-2007

Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare
1993-2006

Telecom Italia Lab
2003

A field that has directly benefited from the recent advances in deep learning is automatic speech recognition (ASR). Despite great achievements of past decades, however, a natural and robust human-machine interaction still appears to be out reach, especially challenging environments characterized by significant noise reverberation. To improve robustness, modern recognizers often employ acoustic models based on recurrent neural networks (RNNs) are naturally able exploit large time contexts...

10.1109/tetci.2017.2762739 article EN IEEE Transactions on Emerging Topics in Computational Intelligence 2018-03-23

Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow estimation, namely normalized cross correlation, LMS adaptive filters, crosspower-spectrum phase: they combined with bidimensional representation, the coherence measure, order to emphasize information exploited estimating position of both non-moving and moving sources. To compare given techniques, different sources were...

10.1109/icassp.1994.389667 article EN 2002-12-17

The article reports on the use of crosspower-spectrum phase (CSP) analysis as an accurate time delay estimation (TDE) technique. It is used in a microphone array system for location acoustic events noisy and reverberant environments. A corresponding coherence measure (CM) its graphical representation are introduced to show TDE accuracy. Using two-microphone pair array, real experiments less than 10 cm average error 6 m/spl times/6 m area.

10.1109/89.568735 article EN IEEE Transactions on Speech and Audio Processing 1997-05-01

A linear four microphone array can be employed for acoustic event location in a real environment using an accurate time delay estimation. This paper refers to the use of specific technique, based on crosspower spectrum phase (CSP) analysis, that yielded performance. The behavior this technique is investigated under different noise and reverberation conditions. Real experiments as well simulations were conducted analyze wide variety situations. Results show system robustness at quite critical...

10.1109/icassp.1996.543272 article EN 2002-12-24

This paper proposes a new method of frequency-domain blind source separation (FD-BSS), able to separate acoustic sources in challenging conditions. In BSS, the time-domain signals are transformed into time-frequency series and is generally performed by applying independent component analysis (ICA) at each frequency envelope. When short observed long demixing filters required, number time observations for limited variance ICA estimator increases due intrinsic statistical bias. Furthermore,...

10.1109/tasl.2010.2053027 article EN IEEE Transactions on Audio Speech and Language Processing 2010-06-23

This paper introduces the contents and possible usage of DIRHA-ENGLISH multi-microphone corpus, recently realized under EC DIRHA project. The reference scenario is a domestic environment equipped with large number microphones microphone arrays distributed in space. corpus composed both real simulated material, it includes 12 US UK English native speakers. Each speaker uttered different sets phonetically-rich sentences, newspaper articles, conversational speech, keywords, commands. From this...

10.1109/asru.2015.7404805 article EN 2015-12-01

Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs).The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability learn long-term dependencies and robustness vanishing gradients.Nevertheless, LSTMs have a rather complex design with three multiplicative gates, might impair efficient implementation.An...

10.21437/interspeech.2017-775 article EN Interspeech 2022 2017-08-16

Compact multi-sensor platforms are portable and thus desirable for robotics personal-assistance tasks. However, compared to physically distributed sensors, the size of these makes person tracking more difficult. To address this challenge, we propose a novel 3-D audio-visual people tracker that exploits visual observations (object detections) guide acoustic processing by constraining likelihood on horizontal plane defined predicted height speaker. This solution allows estimate, with small...

10.1109/tmm.2019.2902489 article EN IEEE Transactions on Multimedia 2019-03-01

End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty recognizing uncommon words, that appear infrequently in the training data. One promising method, to improve accuracy on such rare is latch onto personalized/contextual information at inference. In this work, we present a novel context-aware transformer transducer (CATT) network improves state-of-the-art transformer-based ASR system by taking advantage of contextual signals. Specifically, propose multi-head...

10.1109/asru51503.2021.9687895 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

A microphone array can be used to locate a dominant acoustic source in given environment. This capability is successfully employed an active talker teleconferencing or other multi-speaker applications. In this work the location obtained two steps: (1) time difference of arrival (TDOA) computation between signals array; (2) "optimal" based on interchannel delay estimates and geometrical description sensor arrangement. The crosspower spectrum phase technique was for TDOA estimation, while...

10.1109/icassp.1997.599611 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2002-11-22

According to the physical meaning of frequency-domain blind source separation (FD-BSS), each mixing matrix estimated by independent component analysis (ICA) contains information on acoustic propagation related and then can be used for localization purposes. In this paper, we analyze Generalized State Coherence Transform (GSCT) which is a non-linear transform space represented whole demixing matrices. The enables an accurate estimation time-delay multiple sources in dimensions. Furthermore,...

10.1109/tasl.2011.2160168 article EN IEEE Transactions on Audio Speech and Language Processing 2011-06-21

The availability of realistic simulated corpora is key importance for the future progress distant speech recognition technology.The reliability, flexibility and low computational cost a data simulation process may ultimately allow researchers to train, tune test different techniques in variety acoustic scenarios, avoiding laborious effort directly recording real from targeted environment.In last decade, several have been released research community, including data-sets distributed context...

10.21437/interspeech.2016-731 article EN Interspeech 2022 2016-08-29

Comparing the different sound source localization techniques, proposed in literature during last decade, represents a relevant topic order to establish advantages and disadvantages of given approach real-time implementation. Traditionally, algorithms for rely on an estimation time difference arrival (TDOA) at microphone pairs through GCC-PHAT When several are available position can be estimated as point space that best fits set TDOA measurements by applying global coherence field (GCF), also...

10.1109/hscma.2008.4538690 article EN 2008-05-01

10.1155/2010/147495 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2010-01-01

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in last years on both enhancement and recognition, one potential limitation state-of-the-art technology lies composing modules that not well matched because they trained jointly.

10.1109/slt.2016.7846241 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

Audio-visual tracking of an unknown number concurrent speakers in 3D is a challenging task, especially when sound and video are collected with compact sensing platform. In this paper, we propose tracker that builds on generative discriminative audio-visual likelihood models formulated particle filtering framework. We localize multiple de-emphasized acoustic map assisted by the image detection-derived observations. The multi-modal observations either assigned to existing tracks for...

10.1109/tmm.2021.3061800 article EN IEEE Transactions on Multimedia 2021-02-24
Coming Soon ...