NFDI4DS | UHH-SEMS - Publication Details

UTD-CRSS Systems for 2018 NIST Speaker Recognition Evaluation

OPENALEX - Publications

Chunlei Zhang Fahimeh Bahmaninezhad Shivesh Ranjan Harishchandra Dubey Wei Xia and 1 more

In this study, we present systems submitted by the Center for Robust Speech Systems (CRSS) from UTDallas to NIST SRE 2018 (SRE18). Three alternative front-end speaker embedding frameworks are investigated, that includes: (i) i-vector, (ii) x-vector, (iii) and a modified triplet system (t-vector). Similar previous SRE, language mismatch between training enrollment/test data, so-called domain mismatch, remains as major challenge in evaluation. addition, SRE18 also introduces small portion of...

10.1109/icassp.2019.8683097 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation

OPENALEX - Publications

Fahimeh Bahmaninezhad Jian Wu Rongzhi Gu Shi-Xiong Zhang Yong Xu and 2 more

Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain.Recently, a raw audio waveform network (TasNet) is introduced data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (sourceto-distortion comparing against state-of-the-art solution frequency-domain.In this study, we incorporate effective components of TasNet into frequency-domain method.We compare...

10.21437/interspeech.2019-3181 article EN Interspeech 2022 2019-09-13

Convolutional Neural Network Based Speaker De-Identification

OPENALEX - Publications

Fahimeh Bahmaninezhad Chunlei Zhang John Hansen

10.21437/odyssey.2018-36 article EN 2018-06-06

i-Vector/PLDA speaker recognition using support vectors with discriminant analysis

OPENALEX - Publications

Fahimeh Bahmaninezhad John H. L. Hansen

i-Vector feature representation with probabilistic linear discriminant analysis (PLDA) scoring in speaker recognition system has recently achieved effective permanence even on channel mismatch conditions. In general, experiments carried out using this combined strategy employ (LDA) after the extraction phase to suppress irrelevant directions, such as those introduced by noise or distortions. However, speaker-related and -non-related variability present data may prevent LDA from finding best...

10.1109/icassp.2017.7953190 article EN 2017-03-01

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

OPENALEX - Publications

Kong Aik Lee Ville Hautamäki Tomi Kinnunen H. Yamamoto Koji Okabe and 41 more

The I4U consortium was established to facilitate a joint entry NIST speaker recognition evaluations (SRE). latest edition of such submission in SRE 2018, which the among best-performing systems. SRE'18 also marks 10-year anniversary into series evaluation. primary objective current paper is summarize results and lessons learned based on twelve sub-systems their fusion submitted SRE'18. It our intention present shared view advancements, progresses, major paradigm shifts that we have witnessed...

10.21437/interspeech.2019-1533 preprint EN Interspeech 2022 2019-09-13

Compensation for Domain Mismatch in Text-independent Speaker Recognition

OPENALEX - Publications

Fahimeh Bahmaninezhad John H. L. Hansen

10.21437/interspeech.2018-1446 article EN Interspeech 2022 2018-08-28

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

OPENALEX - Publications

Chunlei Zhang Fahimeh Bahmaninezhad Shivesh Ranjan Chengzhu Yu Navid Shokouhi and 1 more

This document briefly describes the systems submitted by Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to 2016 National Institute Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM DNN i-Vector based speaker recognition with different data sets feature representations. Given that emphasis NIST SRE is on language mismatch between training enrollment/test data, so-called domain mismatch, in our system...

10.21437/interspeech.2017-555 preprint EN Interspeech 2022 2017-08-16

Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker Recognition

OPENALEX - Publications

Fahimeh Bahmaninezhad John H. L. Hansen

10.21437/interspeech.2016-1523 article EN Interspeech 2022 2016-08-29

An investigation of domain adaptation in speaker embedding space for speaker recognition

OPENALEX - Publications

Fahimeh Bahmaninezhad Chunlei Zhang John H. L. Hansen

10.1016/j.specom.2021.01.001 article EN Speech Communication 2021-01-23

UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech

OPENALEX - Publications

Ahmet Bulut Qian Zhang Chunlei Zhang Fahimeh Bahmaninezhad John H. L. Hansen

This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to MGB-3 Arabic Dialect Identification (ADI) subtask. task is defined discriminate between five dialects Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single with different front-end representations back-end classifiers. At level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) two...

10.1109/asru.2017.8268958 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017-12-01

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

OPENALEX - Publications

Soheil Khorram Hossein Sameti Fahimeh Bahmaninezhad Simon King Thomas Drugman

Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations this decision tree-based structure: (i) The tree structure lacks adequate context generalization. (ii) It is unable express complex dependencies. (iii) Parameters generated from sudden transitions between adjacent states. In order...

10.1186/1687-4722-2014-12 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2014-04-07

A Unified Framework for Speech Separation

OPENALEX - Publications

Fahimeh Bahmaninezhad Shi-Xiong Zhang Yong Xu Meng Yu John H. L. Hansen and 1 more

Speech separation refers to extracting each individual speech source in a given mixed signal. Recent advancements and ongoing research this area, have made these approaches as promising techniques for pre-processing of naturalistic audio streams. After incorporating deep learning into separation, performance on systems is improving faster. The initial solutions introduced based analyzed the signals time-frequency domain with STFT; then encoded were fed neural network separator. Most...

10.48550/arxiv.1912.07814 preprint EN other-oa arXiv (Cornell University) 2019-01-01

HMM-based persian speech synthesis using limited adaptation data

OPENALEX - Publications

Fahimeh Bahmaninezhad Hossein Sameti Soheil Khorram

Speech synthesis systems provided for the Persian language so far need various large-scale speech corpora to synthesize several target speakers' voice. Accordingly, synthesizing with a small amount of data seems be essential in Persian. Taking advantage speaker adaptation makes it possible generate remarkable quality when are limited. Here we conducted this method first time This paper describes based on Hidden Markov Models (HMMs) system FARsi DATabase (FARSDAT). In regard, prepared whole...

10.1109/icosp.2012.6491556 article EN 2012-10-01

Context-dependent deterministic plus stochastic model

OPENALEX - Publications

Soheil Khorram Hossein Sameti Fahimeh Bahmaninezhad

This article proposes a method to improve the performance of deterministic plus stochastic model (DSM-) based feature extraction by integrating contextual information. One precious advantage speech synthesis over recognition is that in both training and testing phases synthesis, information available. However, similar recognition, this invaluable knowledge has been forgotten during acoustic synthesis. DSM expresses residual Mel-cepstral analysis through summation two components, namely...

10.1109/icosp.2014.7015067 article EN 2014-10-01

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

OPENALEX - Publications

Chunlei Zhang Fahimeh Bahmaninezhad Shivesh Ranjan Chengzhu Yu Navid Shokouhi and 1 more

This document briefly describes the systems submitted by Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to 2016 National Institute Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM DNN i-Vector based speaker recognition with different data sets feature representations. Given that emphasis NIST SRE is on language mismatch between training enrollment/test data, so-called domain mismatch, in our system...

10.48550/arxiv.1610.07651 preprint EN cc-by arXiv (Cornell University) 2016-01-01

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

OPENALEX - Publications

Mohammad Soleymanpour Mahmoud Al Ismail Fahimeh Bahmaninezhad Kshitiz Kumar Jian Wu

We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) fully alignment model and subsequently streaming transformer model, (c) parallel encoder structure language identification (LID) loss, (d) an auxiliary loss monolingual projections. conclude that comparison LID our proposed is superior...

10.48550/arxiv.2308.06327 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

UTD-CRSS Submission for MGB-3 Arabic Dialect Identification: Front-end and Back-end Advancements on Broadcast Speech

OPENALEX - Publications

Ahmet Bulut Qian Zhang Chunlei Zhang Fahimeh Bahmaninezhad John H. L. Hansen

This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to MGB-3 Arabic Dialect Identification (ADI) subtask. task is defined discriminate between five dialects Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single with different front-end representations back-end classifiers. At level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) two...

10.48550/arxiv.1710.00113 preprint EN other-oa arXiv (Cornell University) 2017-01-01

A comprehensive study of speech separation: spectrogram vs waveform separation

OPENALEX - Publications

Fahimeh Bahmaninezhad Jian Wu Rongzhi Gu Shi-Xiong Zhang Yong Xu and 2 more

Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform network (TasNet) is introduced data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion comparing against state-of-the-art solution In this study, we incorporate effective components of TasNet into frequency-domain method. We compare both...

10.48550/arxiv.1905.07497 preprint EN other-oa arXiv (Cornell University) 2019-01-01