NFDI4DS | UHH-SEMS - Publication Details

Mathew Magimai.-Doss

ORCID: 0000-0002-8714-1409

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043551083

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Speech and dialogue systems
Emotion and Mood Recognition
Phonetics and Phonology Research
Voice and Speech Disorders
Hand Gesture Recognition Systems
Blind Source Separation Techniques
Hearing Impairment and Communication
Neural Networks and Applications
Topic Modeling
Advanced Data Compression Techniques
Advanced Adaptive Filtering Techniques
Human Pose and Action Recognition
Gait Recognition and Analysis
Text and Document Classification Technologies
Advanced Memory and Neural Computing
Indoor and Outdoor Localization Technologies
Phonocardiography and Auscultation Techniques
Animal Vocal Communication and Behavior
Video Analysis and Summarization
Mental Health via Writing
Neural dynamics and brain function

Idiap Research Institute
2016-2025

Radboud University Nijmegen
2012

International Computer Science Institute
2007-2008

École Polytechnique Fédérale de Lausanne
2002-2006

Dalle Molle Institute for Artificial Intelligence Research
2003-2004

Universitat Politècnica de Catalunya
2003

University of Washington
1991-1998

Chiron (Norway)
1998

Center Point
1998

IBM (United States)
1981

Analysis of CNN-based speech recognition system using raw speech as input

OPENALEX - Publications

Dimitri Palaz Mathew Magimai.-Doss Ronan Collobert

Automatic speech recognition systems typically model the relationship between acoustic signal and phones in two separate steps: feature extraction classifier training.In our recent works, we have shown that, framework of convolutional neural networks (CNN), raw can be directly modeled ASR competitive to standard approach built.In this paper, first analyze show layers, CNN learns (in parts) models phone-specific spectral envelope information 2-4 ms speech.Given that CNN-based yields trends...

10.21437/interspeech.2015-3 article EN Interspeech 2022 2015-09-06

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition

OPENALEX - Publications

Dimitri Palaz Mathew Magimai.-Doss Ronan Collobert

10.1016/j.specom.2019.01.004 article EN Speech Communication 2019-01-30

Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks

OPENALEX - Publications

Dimitri Palaz Ronan Collobert Mathew Magimai.-Doss

In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic speech recognition (ASR) system, the phoneme class conditional probabilities are estimated by first extracting acoustic features from signal based on prior knowledge such as, perception or/and production knowledge, and, then modeling with an ANN.Recent advances in machine learning techniques, more specifically field of image processing and text processing, have shown that divide conquer strategy (i.e., separating...

10.21437/interspeech.2013-438 article EN Interspeech 2022 2013-08-25

Convolutional Neural Networks-based continuous speech recognition using raw speech signal

OPENALEX - Publications

Dimitri Palaz Mathew Magimai.-Doss Ronan Collobert

State-of-the-art automatic speech recognition systems model the relationship between acoustic signal and phone classes in two stages, namely, extraction of spectral-based features based on prior knowledge followed by training model, typically an artificial neural network (ANN). In our recent work, it was shown that Convolutional Neural Networks (CNNs) can from raw signal, reaching performance par with other existing feature-based approaches. This paper extends CNN-based approach to large...

10.1109/icassp.2015.7178781 article EN 2015-04-01

Towards Directly Modeling Raw Speech Signal for Speaker Verification Using CNNS

OPENALEX - Publications

Hannah Muckenhirn Mathew Magimai.-Doss Sebastien Marcell

Speaker verification systems traditionally extract and model cepstral features or filter bank energies from the speech signal. In this paper, inspired by success of neural network-based approaches to directly raw signal for applications such as recognition, emotion recognition anti-spoofing, we propose a speaker approach where discriminative information is learned by: (a) first training CNN-based identification system that takes input learns classify on speakers (unknown system); then (b)...

10.1109/icassp.2018.8462165 article EN 2018-04-01

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop

OPENALEX - Publications

Karen Livescu Özgür Çetin Mark Hasegawa‐Johnson Simon King Chris Bartels and 10 more

We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In area modeling, we outputs AF classifiers both directly, an extension hybrid HMM/neural network models, as part vector, "tandem" approach. investigate a model having multiple streams states with soft synchrony constraints, audio-only audio-visual The are implemented dynamic Bayesian networks, tested tasks from...

10.1109/icassp.2007.366989 article EN 2007-04-01

Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

OPENALEX - Publications

Joel Pinto S. Garimella Mathew Magimai.-Doss Hynek Heřmanský Hervé Bourlard

We analyze a simple hierarchical architecture consisting of two multilayer perceptron (MLP) classifiers in tandem to estimate the phonetic class conditional probabilities. In this setup, first MLP classifier is trained using standard acoustic features. The second posterior probabilities phonemes estimated by first, but with long temporal context around 150-230 ms. Through extensive phoneme recognition experiments, and analysis Volterra series, we show that 1) system yields higher...

10.1109/tasl.2010.2045943 article EN IEEE Transactions on Audio Speech and Language Processing 2010-03-19

On Joint Optimization of Automatic Speaker Verification and Anti-Spoofing in the Embedding Space

OPENALEX - Publications

Alejandro Gomez-Alanis José A. González S. Pavankumar Dubagunta Antonio M. Peinado Mathew Magimai.-Doss

Biometric systems are exposed to spoofing attacks which may compromise their security, and voice biometrics based on automatic speaker verification (ASV), is no exception. To increase the robustness against such attacks, anti-spoofing have been proposed for detection of replay, synthesis conversion-based attacks. However, most techniques loosely integrated with ASV system. In this work, we develop a new integration neural network jointly processes embeddings extracted from in order detect...

10.1109/tifs.2020.3039045 article EN IEEE Transactions on Information Forensics and Security 2020-11-18

Posterior-based analysis of spatio-temporal features for Sign Language Assessment

OPENALEX - Publications

Neha Tarigopula Sandrine Tornay Ozge Mercanoglu Sincan Richard Bowden Mathew Magimai.-Doss

10.1109/ojsp.2025.3531781 article EN cc-by IEEE Open Journal of Signal Processing 2025-01-01

Towards Dynamic Skeleton-based Handshape Subunits for Sign Language Assessment

OPENALEX - Publications

Sandrine Tornay Mathew Magimai.-Doss

10.1109/icassp49660.2025.10888697 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Automatic Parkinson’s disease detection from speech: Layer selection vs adaptation of foundation models

OPENALEX - Publications

Tilak Purohit Barbara Ruvolo Juan Rafael Orozco‐Arroyave Mathew Magimai.-Doss

10.1109/icassp49660.2025.10890852 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task

OPENALEX - Publications

Tilak Purohit Mathew Magimai.-Doss

10.1109/icassp49660.2025.10890800 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing

OPENALEX - Publications

Eklavya Sarkar Mathew Magimai.-Doss

10.1109/icassp49660.2025.10889684 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Multidisciplinary characterization of embarrassment through behavioral and acoustic modeling

OPENALEX - Publications

Dajana Šipka Bogdan Vlasenko Maria Stein Thomas Dierks Mathew Magimai.-Doss and 1 more

Embarrassment is a social emotion that shares many characteristics with anxiety (SA). Most people experience embarrassment in their daily lives, but it quite overlooked research. We characterized through an interdisciplinary approach, introducing behavioral paradigm and applying machine learning approaches, including acoustic analyses. 33 participants wrote about embarrassing then, without knowing prior, had to read out loud the conductor. was then examined using two different approaches:...

10.1038/s41598-025-94051-9 article EN cc-by-nc-nd Scientific Reports 2025-03-20

Using KL-based acoustic models in a large vocabulary recognition task

OPENALEX - Publications

Guillermo Aradilla Hervé Bourlard Mathew Magimai.-Doss

Posterior probabilities of sub-word units have been shown to be an effective front-end for ASR.However, attempts model this type features either do not benefit from modeling context-dependent phonemes, or use inefficient distribution estimate the state likelihood.This paper presents a novel acoustic posterior that overcomes these limitations.The proposed can seen as HMM where score associated with each is KL divergence between characterizing and test utterance.This KL-based establishes...

10.21437/interspeech.2008-110 article EN Interspeech 2022 2008-09-22

End-to-End convolutional neural network-based voice presentation attack detection

OPENALEX - Publications

Hannah Muckenhirn Mathew Magimai.-Doss Sébastien Marcel

Development of countermeasures to detect attacks performed on speaker verification systems through presentation forged or altered speech samples is a challenging and open research problem. Typically, this problem approached by extracting features conventional short-term processing feeding them binary classifier. In article, we develop convolutional neural network-based approach that learns in an end-to-end manner both the classifier from raw signal. Through investigations two publicly...

10.1109/btas.2017.8272715 article EN 2017-10-01

On Learning to Identify Genders from Raw Speech Signal Using CNNs

OPENALEX - Publications

Selen Hande Kabil Hannah Muckenhirn Mathew Magimai.-Doss

Automatic Gender Recognition (AGR) is the task of identifying gender a speaker given speech signal.Standard approaches extract features like fundamental frequency and cepstral from signal train binary classifier.Inspired recent works in area automatic recognition (ASR), presentation attack detection, we present novel approach where relevant classifier are jointly learned raw end-to-end manner.We propose convolutional neural networks (CNN) based that consists of: (1) convolution layers, which...

10.21437/interspeech.2018-1240 article EN Interspeech 2022 2018-08-28

Long-Term Spectral Statistics for Voice Presentation Attack Detection

OPENALEX - Publications

Hannah Muckenhirn Pavel Korshunov Mathew Magimai.-Doss Sébastien Marcel

Automatic speaker verification systems can be spoofed through recorded, synthetic, or voice converted speech of target speakers. To make these practically viable, the detection such attacks, referred to as presentation is paramount interest. In that direction, this paper investigates two aspects: 1) a novel approach detect attacks where, unlike conventional approaches, no signal modeling related assumptions are made, rather detected by computing first-order and second-order spectral...

10.1109/taslp.2017.2743340 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-08-23

Learning Voice Source Related Information for Depression Detection

OPENALEX - Publications

S. Pavankumar Dubagunta Bogdan Vlasenko Mathew Magimai.-Doss

During depression neurophysiological changes can occur, which may affect laryngeal control i.e. behaviour of the vocal folds. Characterising these in a precise manner from speech signals is non trivial task, as this typically involves reliable separation voice source information them. In paper, by exploiting abilities CNNs to learn task-relevant input raw signals, we investigate several methods model related for detection. Specifically, modelling low pass filtered linear prediction residual...

10.1109/icassp.2019.8683498 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

A Comparison of Acoustic and Linguistics Methodologies for Alzheimer’s Dementia Recognition

OPENALEX - Publications

Nicholas Cummins Yilin Pan Zhao Ren Julian Fritsch Venkata Srikanth Nallanthighal and 6 more

In the light of current COVID-19 pandemic, need for remote digital health assessment tools is greater than ever.This statement especially pertinent elderly and vulnerable populations.In this regard, INTERSPEECH 2020 Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) Challenge offers competitors opportunity to develop speech language-based systems task (AD) recognition.The challenge data consists recordings their transcripts, work presented herein an different contemporary...

10.21437/interspeech.2020-2635 article EN Interspeech 2022 2020-10-25

Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings

OPENALEX - Publications

Venkata Srikanth Nallanthighal Zohreh Mostaani Aki Härmä Helmer Strik Mathew Magimai.-Doss

Respiration is an essential and primary mechanism for speech production. We first inhale then produce while exhaling. When we run out of breath, stop speaking inhale. Though this process involuntary, production involves a systematic outflow air during exhalation characterized by linguistic content prosodic factors the utterance. Thus respiration are closely related, modeling relationship makes sensing respiratory dynamics directly from plausible, however not well explored. In article,...

10.1016/j.neunet.2021.03.029 article EN cc-by-nc-nd Neural Networks 2021-04-05

Exploiting contextual information for improved phoneme recognition

OPENALEX - Publications

Joel Pinto B. Yegnanarayana Hynek Heřmanský Mathew Magimai.-Doss

In this paper, we investigate the significance of contextual information in a phoneme recognition system using hidden Markov model - artificial neural network paradigm. Contextual is probed at feature level as well output multilayered perceptron. At level, analyze and compare different methods to sub-phonemic classes. To exploit perceptron, propose hierarchical estimation posterior probabilities. The best (excluding silence) accuracy 73.4% on TIMIT database comparable that state-of- the-art...

10.1109/icassp.2008.4518643 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2008-03-01

Coming Soon ...