NFDI4DS | UHH-SEMS - Publication Details

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

OPENALEX - Publications

Xu Xiang Shuai Wang Houjun Huang Yanmin Qian Kai Yu

Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...

10.1109/apsipaasc47483.2019.9023039 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019-11-01

DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks

OPENALEX - Publications

Houjun Huang Shilei Liu Zheng He Liao Ni Yi Zhang and 1 more

Finger vein verification is using patterns to verify a person's identity, which widely used in various fields. In practice, the method for most important part of biometric system, determines reliability system. this paper, we propose methods called DeepVein finger based on deep convolutional neural networks and conduct experiments evaluate our methods. The experimental results show that proposed can achieve state-of-the-art performance accuracy. addition, present how amount data training...

10.1109/isba.2017.7947683 article EN 2017-02-01

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

OPENALEX - Publications

Yanmin Qian Xun Gong Houjun Huang

The variety and complexity of accents pose a huge challenge to robust Automatic Speech Recognition (ASR). Some previous work has attempted address such problems, however most the current approaches either require prior knowledge about target accent, or cannot handle unseen accent-unspecific standard speech. In this work, we aim improve multi-accent speech recognition in end-to-end (E2E) framework with novel layer-wise adaptation architecture. Firstly, propose deep accent...

10.1109/taslp.2022.3198546 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2022-01-01

Damping Performance Evaluation of Tuned Liquid Dampers for Wind-Induced Vibration Control of High-Rise Buildings

OPENALEX - Publications

Lanfang Zhang Zijie Zhou Zhuangning Xie Lele Zhang Houjun Huang

Tuned liquid dampers (TLDs) have received considerable attention as effective passive dynamic vibration absorbers for controlling wind-induced vibrations in high-rise buildings. However, due to the complex coupled response of structure–TLD system, accurately evaluating damping performance TLD is challenging. This study proposes a method evaluate through response. First, state space model system built. Second, technique used determine matrix and obtain modal frequencies, ratios, mass ratios...

10.2139/ssrn.5087109 preprint EN 2025-01-01

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge

OPENALEX - Publications

Houjun Huang Xu Xiang Yexin Yang Rao Ma Yanmin Qian

This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...

10.1109/icassp39728.2021.9414292 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

OPENALEX - Publications

Zhengyang Chen Bing Han Xu Xiang Houjun Huang Bei Liu and 1 more

10.21437/interspeech.2023-1217 article EN Interspeech 2022 2023-08-14

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

OPENALEX - Publications

Xu Xiang Shuai Wang Houjun Huang Yanmin Qian Kai Yu

Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...

10.48550/arxiv.1906.07317 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Botzone

OPENALEX - Publications

Haoyu Zhou Yushan Zhou Haifeng Zhang Houjun Huang Wenxin Li

This paper presents Botzone, a competitive and interactive platform for game AI education, aiming to simplify the teaching process of courses inspire self-study learners. Botzone is universal online platform, designed evaluate different implementations by applying them agents in variety games compete with each other. It has been successfully used various competition practice, expandability support more languages, as well further usages such researching machine learning on AI. In this paper,...

10.1145/3063955.3063961 article EN 2017-05-08

Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification

OPENALEX - Publications

Houjun Huang Xu Xiang Fei Zhao Shuai Wang Yanmin Qian

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...

10.1109/icassp39728.2021.9414550 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Voice biometrics using linear Gaussian model

OPENALEX - Publications

Hai Yang Yunfei Xu Houjun Huang Ruohua Zhou Yonghong Yan

This study introduces a linear Gaussian model-based framework for voice biometrics. The model works with discrete-time dynamical systems. motivation is to use the modelling method in biometrics, and show that accuracy offered by comparable other state-of-the-art methods such as Probabilistic Linear Discriminant Analysis two-covariance model. An expectation–maximisation algorithm derived train Bayesian solution used calculate log-likelihood ratio score of all trials speakers. approach...

10.1049/iet-bmt.2013.0027 article EN IET Biometrics 2013-08-19

Robust speaker recognition using library of cross‐domain variation compensation transforms

OPENALEX - Publications

Houjun Huang Shengyu Yao Ruohua Zhou Yonghong Yan

Although the state‐of‐the‐art i‐vector‐based probabilistic linear discriminant analysis systems resulted in promising performances National Institute of Standards and Technology speaker recognition evaluations, impact domain mismatch when system development data evaluation are collected from different sources remains a challenging problem. This issue was focus Johns Hopkins University 2013 workshop where adaptation challenge (DAC13) corpus created to address it. The cross‐domain variation...

10.1049/el.2015.3174 article EN Electronics Letters 2016-01-07

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

OPENALEX - Publications

Houjun Huang Xu Xiang Yexin Yang Rao Ma Yanmin Qian

This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...

10.48550/arxiv.2102.09828 preprint EN other-oa arXiv (Cornell University) 2021-01-01

SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

OPENALEX - Publications

Zhengyang Chen Bing Han Xu Xiang Houjun Huang Bei Liu and 1 more

This report describes the SJTU-AISPEECH system for Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, online and offline system. Different ResNet-based backbones loss functions are explored. Our final fusion achieved 3rd place in track1. track3, statistic adaptation jointly training based domain adaptation. In adaptation, trained source target dataset with different objectives to do We explored data, self-supervised learning angular proto-typical...

10.48550/arxiv.2209.09076 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Speaking style compensation on synthetic audio for robust keyword spotting

OPENALEX - Publications

Houjun Huang BYanmin Qian

With the rise of intelligent speech processing applications, quickly producing keyword spotting (KWS) models with low resource has gained particular importance in recent years. Multi-speaker text-to-speech (TTS) been proved to be an effective data augmentation technique for KWS help complement inadequacies training data. However, previous works, system built TTS augmented couldn’t obtain considerable performance that trained real recordings as synthetic speeches could not fully represent...

10.1109/iscslp57327.2022.10038031 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022-12-11

Parameter adjustment of finger vein recognition algorithms

OPENALEX - Publications

Zheng He Yapeng Ye Shilei Liu Liao Ni Yi Zhang and 2 more

Finger vein recognition is a biometric method utilizing the patterns inside one's fingers for personal identification. Recognition algorithm key part of finger system, dominating system performance. There are usually lot parameters in algorithms, and different values could lead to performance so that it essential set proper value each parameter practice. In this paper, we conduct experiments study how influence measured by equal error rate. We have made two observations from results: 1.When...

10.1109/isba.2017.7947697 article EN 2017-02-01

ICFVR 2017: 3rd international competition on finger vein recognition

OPENALEX - Publications

Yi Zhang Houjun Huang Haifeng Zhang Liao Ni Wei Xu and 6 more

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition (ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...

10.1109/btas.2017.8272760 article EN 2017-10-01

Cross‐domain variation compensation for robust speaker verification

OPENALEX - Publications

Houjun Huang Ruohua Zhou Yonghong Yan

Recent studies have shown that when state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are developed with out-domain data, the mismatch between development data and evaluation significantly degrades performance. An unsupervised cross-domain variation compensation (CDVC) approach to compensate domain is proposed. This based on assumption inter-domain variability an additive factor normal distribution in i-vector space. The effect of adaption...

10.1049/el.2015.1701 article EN Electronics Letters 2015-10-01

Feature recovery for noise‐robust speaker verification

OPENALEX - Publications

Houjun Huang Yunfei Xu Ruohua Zhou Yonghong Yan

Noisy condition is an important extrinsic degradation affecting speaker verification system performance. A feature-recovery approach proposed to eliminate noise-dependent variability in feature space. frame of the noisy vector recovered using information itself and neighbour vectors. Experiments are conducted on test sets for text-dependent tasks results indicate that can achieve significant performance improvement by

10.1049/el.2015.1418 article EN Electronics Letters 2015-08-13

A Decision Reliability Ratio Based Fusion Scheme for Biometric Verification

OPENALEX - Publications

Liao Ni Yi Zhang Shilei Liu Houjun Huang Wenxin Li

Unimodal biometric verification has developed a lot and become more accurate, but there is still not perfect algorithm. In the meantime, cases exist where unimodal system could meet requirements in practical use. It proved that algorithms with same overall accuracy may have different misclassified patterns. We make use of this complementation to fuse individual together for precise result. According our observation, algorithm confidence on its decisions seldom considered fusion methods. Our...

10.1145/3093293.3093310 article EN 2017-05-14

Text-dependent Speaker Verification Using Word-based Scoring

OPENALEX - Publications

Shengyu Yao Houjun Huang Ruohua Zhou Yonghong Yan

As the separated modeling methods are widely used in text-dependent speaker verification task. The reason why they so effective is discussed this paper. A word-based scoring method then proposed based on our discussion. Specifically, a segmentation algorithm firstly for segmenting enrollment and test utterances into words, automatically. Then every segment of utterance to enroll word model. Scoring done with each testing corresponding model same word. experiments carried out short duration...

10.1109/iscslp.2018.8706618 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018-11-01

A Fusion Method Based on Decision Reliability Ratio for Finger Vein Verification

OPENALEX - Publications

Liao Ni Yi Zhang Zheng He Shilei Liu Houjun Huang and 1 more

Finger vein verification has developed a lot since its first proposal, but there is still not perfect algorithm. It proved that algorithms with the same overall accuracy may have different misclassified patterns. We could make use of this complementation to fuse individual together for more precise result. According our observation, algorithm confidence on decisions it seldom considered in fusion methods. Our work define decision reliability ratio quantify confidence, and then propose...

10.48550/arxiv.1612.05712 preprint EN other-oa arXiv (Cornell University) 2016-01-01

ICFVR 2017: 3rd International Competition on Finger Vein Recognition

OPENALEX - Publications

Yi Zhang Houjun Huang Haifeng Zhang Liao Ni Wei Xu and 6 more

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...

10.48550/arxiv.1801.01262 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

OPENALEX - Publications

Zhengyang Chen Bing Han Xu Xiang Houjun Huang Bei Liu and 1 more

Many speaker recognition challenges have been held to assess the verification system in wild and probe performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on voxceleb, is most popular. Besides, another challenge called CN-Celeb (CNSRC) also this year, which Chinese celebrity multi-genre dataset CN-Celeb. This our team participated both closed tracks CNSRC 2022 VoxSRC 2022, achieved 1st place 3rd respectively. In reports, authors usually only provide a description of...

10.48550/arxiv.2211.00815 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Unit selection synthesis based data augmentation for fixed phrase speaker verification

OPENALEX - Publications

Houjun Huang Xu Xiang Fei Zhao Shuai Wang Yanmin Qian

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...

10.48550/arxiv.2102.09817 preprint EN other-oa arXiv (Cornell University) 2021-01-01