Rahim Saeidi

ORCID: 0000-0002-9084-0091
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Blind Source Separation Techniques
  • Advanced Adaptive Filtering Techniques
  • Advanced Data Compression Techniques
  • Face recognition and analysis
  • Phonetics and Phonology Research
  • Topic Modeling
  • Natural Language Processing Techniques
  • Bayesian Methods and Mixture Models
  • IoT-based Smart Home Systems
  • Face and Expression Recognition
  • Indoor and Outdoor Localization Technologies
  • Biometric Identification and Security
  • Neural Networks and Applications
  • Data Management and Algorithms
  • Infant Health and Development
  • Handwritten Text Recognition Techniques
  • Time Series Analysis and Forecasting
  • Text and Document Classification Technologies
  • Geographic Information Systems Studies

Aalto University
2012-2018

Cirrus Logic (United Kingdom)
2018

University of Technology
2015

Radboud University Nijmegen
2012-2014

University of Eastern Finland
2010-2014

Finland University
2010-2014

Aalborg University
2011

Academic Center for Education, Culture and Research
2009

Iranian Institute for Health Sciences Research
2007-2008

Shariati Hospital
2007

In speech and audio applications, short-term signal spectrum is often represented using mel-frequency cepstral coefficients (MFCCs) computed from a windowed discrete Fourier transform (DFT). Windowing reduces spectral leakage but variance of the estimate remains high. An elegant extension to DFT so-called multitaper method which uses multiple time-domain windows (tapers) with frequency-domain averaging. Multitapers have received little attention in processing even though they produce...

10.1109/tasl.2012.2191960 article EN IEEE Transactions on Audio Speech and Language Processing 2012-04-11

Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments encountered. To address this mismatch, we analyze the effect of variability phoneme distributions speech and i-vector length. We demonstrate that, as utterance is decreased, number detected unique phonemes length approaches zero in a logarithmic non-linear fashion, respectively. Assuming an additive noise space, propose three different strategies for its...

10.1109/icassp.2013.6639154 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

This paper investigates the effect of utterance duration to calibration a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling. A approach deal these effects using quality measure functions (QMFs) is proposed include in transformation. Extensive experiments are performed order evaluate robustness for unseen conditions training parameters. Using latest NIST corpora evaluation, results highlight importance considering metrics like...

10.1109/tasl.2013.2279332 article EN IEEE Transactions on Audio Speech and Language Processing 2013-08-22

Many short-time Fourier transform (STFT) based single-channel speech enhancement algorithms are focused on estimating the clean spectral amplitude from noisy observed signal in order to suppress additive noise. To this end, they utilize information and corresponding a priori posteriori SNRs while employ phase when reconstructing enhanced signal. This paper presents two contributions: i) reconsidering relation between group delay deviation deviation, ii) proposing closed-loop approach...

10.1109/lsp.2013.2286748 article EN IEEE Signal Processing Letters 2013-10-21

Single-channel speech separation algorithms frequently ignore the issue of accurate phase estimation while reconstructing enhanced signal. Instead, they directly employ mixed-signal for signal reconstruction which leads to undesired traces interfering source in target In this paper, assuming a given knowledge spectrum amplitude, we present solution estimate information sources from single-channel mixture observation. We first investigate effectiveness proposed method employing known...

10.21437/interspeech.2012-436 article EN Interspeech 2022 2012-09-09

I4U is a joint entry of nine research Institutes and Universities across 4 continents to NIST SRE 2012.It started with brief discussion during the Odyssey 2012 workshop in Singapore.An online group was soon set up, providing platform for different issues surrounding SRE'12.Noisy test segments, uneven multi-session training, variable enrollment duration, issue open-set identification were actively discussed leading various solutions integrated submission.The submission several its 17...

10.21437/interspeech.2013-472 article EN Interspeech 2022 2013-08-25

In this paper, three utterance modelling approaches, namely Gaussian Mean Supervector (GMS), i-vector and Posterior Probability (GPPS), are applied to the accent recognition problem.For each modeling method, different classifiers, Support Vector Machine (SVM), Naive Bayesian Classifier (NBC) Sparse Representation (SRC), employed find out suitable matches between schemes classifiers.The evaluation database is formed by using English utterances of speakers whose native languages Russian,...

10.1109/icassp.2013.6639089 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01

This paper evaluates the performance of twelve primary systems submitted to evaluation on speaker verification in context a mobile environment using MOBIO database. The provides challenging and realistic test-bed for current state-of-the-art techniques. Results terms equal error rate (EER), half total (HTER) detection trade-off (DET) confirm that best performing are based variability modeling, fusion several sub-systems. Nevertheless, good old UBM-GMM still competitive. results also show use...

10.1109/icb.2013.6613025 preprint EN 2013-06-01

Text-independent speaker verification under additive noise corruption is considered. In the popular mel-frequency cepstral coefficient (MFCC) front-end, conventional Fourier-based spectrum estimation substituted with weighted linear predictive methods, which have earlier shown success in noise-robust speech recognition. Two temporally variants of modeling are introduced to and they compared FFT, normally used computing MFCCs, prediction. The effect enhancement (spectral subtraction) on...

10.1109/lsp.2010.2048649 article EN IEEE Signal Processing Letters 2010-04-20

In this paper, we present a novel system for joint speaker identification and speech separation. For single-channel algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as by-product. separation, propose sinusoidal model-based algorithm. The separation consists double-talk/single-talk detector followed by minimum mean square error estimator parameters finding optimal codevectors from pre-trained codebooks. evaluating the system, start situation where have prior...

10.1109/tasl.2012.2208627 article EN IEEE Transactions on Audio Speech and Language Processing 2012-07-13

Linear discriminant analysis (LDA) is a powerful technique in pattern recognition to reduce the dimensionality of data vectors. It maximizes discriminability by retaining only those directions that minimize ratio within-class and between-class variance. In this paper, using same principles as for conventional LDA, we propose employ uncertainties noisy or distorted input order estimate maximally directions. We demonstrate efficiency proposed uncertain LDA on two applications state-of-the-art...

10.1109/tpami.2015.2481420 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2015-09-23

Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide robust but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) 16.41 (-10 dB SNR) for...

10.21437/interspeech.2010-724 article EN Interspeech 2022 2010-09-26

Previous studies on performance evaluation of single-channel speech separation (SCSS) algorithms mostly focused automatic recognition (ASR) accuracy as their measure. Assessing the separated signals by different metrics other than this has benefit that results are expected to carry applications beyond ASR. In paper, in addition conventional quality (PESQ and SNR <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">loss</sub> ), we also evaluate...

10.1109/icassp.2012.6287819 article EN 2012-03-01

Spectrogram factorisation using a dictionary of spectrotemporal atoms has been successfully employed to separate mixed audio signal into its source components. When from multiple sources are included in combined dictionary, the relative weights activated reveal likely as well content each source. Enforcing sparsity on activation produces solutions, where only small number active at time. In this paper we propose group restrict simultaneous sources, allowing us discover identity an unknown...

10.21437/interspeech.2012-571 article EN Interspeech 2022 2012-09-09

Degraded signal quality and incomplete voice probes have severe effects on the performance of a speaker recognition system.Unified audio characteristics (UACs) been proposed to quantify multi-condition degradation into posterior probabilities classes.In previous work, we showed that UAC-based vectors (q-vectors) are efficient at score-normalization stage.Hence, motivate qvector based calibration by using functions estimates (FQEs).In this examine robustness approaches low-SNR short-duration...

10.21437/odyssey.2016-52 article EN 2016-06-21

Linear prediction is one of the most established techniques in signal estimation, and it widely utilized speech processing. It has been long understood that nerve firing rate human auditory system can be approximated by power law non-linearity, this motivation behind using perceptual linear extracting acoustic features a variety processing applications. In paper, we revisit application non-linearity spectrum estimation compressing/expanding autocorrelation-based prediction. The development...

10.1109/taslp.2015.2493366 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2015-10-22

Usually the mel-frequency cepstral coefficients are estimated either from a periodogram or windowed periodogram. We state general estimator which also includes multitaper estimators. propose approximations of variance and bias estimate each coefficient. By using Monte Carlo computations, we demonstrate that accurate. Using proposed formulas, peak matched is shown to have low mean square error (squared + variance) on speech-like processes. It perform slightly better in NIST 2006 speaker...

10.1109/lsp.2010.2040228 article EN IEEE Signal Processing Letters 2010-01-20

Regularization of linear prediction based mel-frequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectrum speech frames. In this paper, DFT estimate replaced with recently proposed regularized (RLP) method. temporally weighted variants, LP (WLP) and stabilized WLP (SWLP) which have earlier shown success recognition, also introduced. A novel type double autocorrelation (DAC) lag...

10.1109/lsp.2012.2184284 article EN IEEE Signal Processing Letters 2012-01-13

An evaluation of the verification and calibration performance a face recognition system based on inter-session variability modelling is presented. As an extension to through linear transformation scores, categorical introduced as way include additional information about images for calibration. The cost likelihood ratio, which well-known measure in speaker field, used metric. results obtained from challenging mobile biometrics surveillance camera databases indicate that linearly calibrated...

10.1049/iet-bmt.2013.0066 article EN cc-by-nc-nd IET Biometrics 2014-02-26

In this paper, we study the impact of exploiting spectral phase information to further improve speech quality single-channel enhancement algorithms. particular, focus on two required steps in a typical system, namely: parameter estimation solved by minimum mean square error (MMSE) estimator amplitude, followed signal reconstruction stage, where observed noisy is often used. For contrast conventional Wiener filter, new MMSE derived which takes into account clean as prior information. our...

10.1109/icassp.2013.6639113 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2013-05-01
Coming Soon ...