Fahimeh Bahmaninezhad

ORCID: 0000-0001-8308-7989
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Advanced Adaptive Filtering Techniques
  • Phonetics and Phonology Research
  • Speech and dialogue systems
  • Authorship Attribution and Profiling
  • Topic Modeling
  • Hearing Loss and Rehabilitation
  • Blind Source Separation Techniques
  • Linguistics and Cultural Studies

The University of Texas at Dallas
2017-2021

Robust Chip (United States)
2019

Sharif University of Technology
2012-2014

In this study, we present systems submitted by the Center for Robust Speech Systems (CRSS) from UTDallas to NIST SRE 2018 (SRE18). Three alternative front-end speaker embedding frameworks are investigated, that includes: (i) i-vector, (ii) x-vector, (iii) and a modified triplet system (t-vector). Similar previous SRE, language mismatch between training enrollment/test data, so-called domain mismatch, remains as major challenge in evaluation. addition, SRE18 also introduces small portion of...

10.1109/icassp.2019.8683097 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain.Recently, a raw audio waveform network (TasNet) is introduced data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (sourceto-distortion comparing against state-of-the-art solution frequency-domain.In this study, we incorporate effective components of TasNet into frequency-domain method.We compare...

10.21437/interspeech.2019-3181 article EN Interspeech 2022 2019-09-13

i-Vector feature representation with probabilistic linear discriminant analysis (PLDA) scoring in speaker recognition system has recently achieved effective permanence even on channel mismatch conditions. In general, experiments carried out using this combined strategy employ (LDA) after the extraction phase to suppress irrelevant directions, such as those introduced by noise or distortions. However, speaker-related and -non-related variability present data may prevent LDA from finding best...

10.1109/icassp.2017.7953190 article EN 2017-03-01

The I4U consortium was established to facilitate a joint entry NIST speaker recognition evaluations (SRE). latest edition of such submission in SRE 2018, which the among best-performing systems. SRE'18 also marks 10-year anniversary into series evaluation. primary objective current paper is summarize results and lessons learned based on twelve sub-systems their fusion submitted SRE'18. It our intention present shared view advancements, progresses, major paradigm shifts that we have witnessed...

10.21437/interspeech.2019-1533 preprint EN Interspeech 2022 2019-09-13

This document briefly describes the systems submitted by Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to 2016 National Institute Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM DNN i-Vector based speaker recognition with different data sets feature representations. Given that emphasis NIST SRE is on language mismatch between training enrollment/test data, so-called domain mismatch, in our system...

10.21437/interspeech.2017-555 preprint EN Interspeech 2022 2017-08-16

This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to MGB-3 Arabic Dialect Identification (ADI) subtask. task is defined discriminate between five dialects Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single with different front-end representations back-end classifiers. At level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) two...

10.1109/asru.2017.8268958 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017-12-01

Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations this decision tree-based structure: (i) The tree structure lacks adequate context generalization. (ii) It is unable express complex dependencies. (iii) Parameters generated from sudden transitions between adjacent states. In order...

10.1186/1687-4722-2014-12 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2014-04-07

Speech separation refers to extracting each individual speech source in a given mixed signal. Recent advancements and ongoing research this area, have made these approaches as promising techniques for pre-processing of naturalistic audio streams. After incorporating deep learning into separation, performance on systems is improving faster. The initial solutions introduced based analyzed the signals time-frequency domain with STFT; then encoded were fed neural network separator. Most...

10.48550/arxiv.1912.07814 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Speech synthesis systems provided for the Persian language so far need various large-scale speech corpora to synthesize several target speakers' voice. Accordingly, synthesizing with a small amount of data seems be essential in Persian. Taking advantage speaker adaptation makes it possible generate remarkable quality when are limited. Here we conducted this method first time This paper describes based on Hidden Markov Models (HMMs) system FARsi DATabase (FARSDAT). In regard, prepared whole...

10.1109/icosp.2012.6491556 article EN 2012-10-01

This article proposes a method to improve the performance of deterministic plus stochastic model (DSM-) based feature extraction by integrating contextual information. One precious advantage speech synthesis over recognition is that in both training and testing phases synthesis, information available. However, similar recognition, this invaluable knowledge has been forgotten during acoustic synthesis. DSM expresses residual Mel-cepstral analysis through summation two components, namely...

10.1109/icosp.2014.7015067 article EN 2014-10-01

This document briefly describes the systems submitted by Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to 2016 National Institute Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM DNN i-Vector based speaker recognition with different data sets feature representations. Given that emphasis NIST SRE is on language mismatch between training enrollment/test data, so-called domain mismatch, in our system...

10.48550/arxiv.1610.07651 preprint EN cc-by arXiv (Cornell University) 2016-01-01

We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) fully alignment model and subsequently streaming transformer model, (c) parallel encoder structure language identification (LID) loss, (d) an auxiliary loss monolingual projections. conclude that comparison LID our proposed is superior...

10.48550/arxiv.2308.06327 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to MGB-3 Arabic Dialect Identification (ADI) subtask. task is defined discriminate between five dialects Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single with different front-end representations back-end classifiers. At level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) two...

10.48550/arxiv.1710.00113 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform network (TasNet) is introduced data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion comparing against state-of-the-art solution In this study, we incorporate effective components of TasNet into frequency-domain method. We compare both...

10.48550/arxiv.1905.07497 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...