Houjun Huang

ORCID: 0000-0003-0757-0949
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Biometric Identification and Security
  • Dermatoglyphics and Human Traits
  • Natural Language Processing Techniques
  • Forensic Fingerprint Detection Methods
  • Vibration Control and Rheological Fluids
  • Wind and Air Flow Studies
  • Scientific Computing and Data Management
  • Handwritten Text Recognition Techniques
  • Graphite, nuclear technology, radiation studies
  • Digital Games and Media
  • Forensic and Genetic Research
  • Phonetics and Phonology Research
  • User Authentication and Security Systems
  • Digital Media Forensic Detection
  • Artificial Intelligence in Games
  • Reinforcement Learning in Robotics
  • Speech and dialogue systems
  • Structural Engineering and Vibration Analysis

Shanghai Jiao Tong University
2021-2022

Chinese Academy of Sciences
2013-2018

Institute of Acoustics
2015-2018

Peking University
2017

Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...

10.1109/apsipaasc47483.2019.9023039 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019-11-01

Finger vein verification is using patterns to verify a person's identity, which widely used in various fields. In practice, the method for most important part of biometric system, determines reliability system. this paper, we propose methods called DeepVein finger based on deep convolutional neural networks and conduct experiments evaluate our methods. The experimental results show that proposed can achieve state-of-the-art performance accuracy. addition, present how amount data training...

10.1109/isba.2017.7947683 article EN 2017-02-01

The variety and complexity of accents pose a huge challenge to robust Automatic Speech Recognition (ASR). Some previous work has attempted address such problems, however most the current approaches either require prior knowledge about target accent, or cannot handle unseen accent-unspecific standard speech. In this work, we aim improve multi-accent speech recognition in end-to-end (E2E) framework with novel layer-wise adaptation architecture. Firstly, propose deep accent...

10.1109/taslp.2022.3198546 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2022-01-01

Tuned liquid dampers (TLDs) have received considerable attention as effective passive dynamic vibration absorbers for controlling wind-induced vibrations in high-rise buildings. However, due to the complex coupled response of structure–TLD system, accurately evaluating damping performance TLD is challenging. This study proposes a method evaluate through response. First, state space model system built. Second, technique used determine matrix and obtain modal frequencies, ratios, mass ratios...

10.2139/ssrn.5087109 preprint EN 2025-01-01

This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...

10.1109/icassp39728.2021.9414292 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...

10.48550/arxiv.1906.07317 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper presents Botzone, a competitive and interactive platform for game AI education, aiming to simplify the teaching process of courses inspire self-study learners. Botzone is universal online platform, designed evaluate different implementations by applying them agents in variety games compete with each other. It has been successfully used various competition practice, expandability support more languages, as well further usages such researching machine learning on AI. In this paper,...

10.1145/3063955.3063961 article EN 2017-05-08

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...

10.1109/icassp39728.2021.9414550 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

This study introduces a linear Gaussian model-based framework for voice biometrics. The model works with discrete-time dynamical systems. motivation is to use the modelling method in biometrics, and show that accuracy offered by comparable other state-of-the-art methods such as Probabilistic Linear Discriminant Analysis two-covariance model. An expectation–maximisation algorithm derived train Bayesian solution used calculate log-likelihood ratio score of all trials speakers. approach...

10.1049/iet-bmt.2013.0027 article EN IET Biometrics 2013-08-19

Although the state‐of‐the‐art i‐vector‐based probabilistic linear discriminant analysis systems resulted in promising performances National Institute of Standards and Technology speaker recognition evaluations, impact domain mismatch when system development data evaluation are collected from different sources remains a challenging problem. This issue was focus Johns Hopkins University 2013 workshop where adaptation challenge (DAC13) corpus created to address it. The cross‐domain variation...

10.1049/el.2015.3174 article EN Electronics Letters 2016-01-07

This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...

10.48550/arxiv.2102.09828 preprint EN other-oa arXiv (Cornell University) 2021-01-01

This report describes the SJTU-AISPEECH system for Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, online and offline system. Different ResNet-based backbones loss functions are explored. Our final fusion achieved 3rd place in track1. track3, statistic adaptation jointly training based domain adaptation. In adaptation, trained source target dataset with different objectives to do We explored data, self-supervised learning angular proto-typical...

10.48550/arxiv.2209.09076 preprint EN other-oa arXiv (Cornell University) 2022-01-01

With the rise of intelligent speech processing applications, quickly producing keyword spotting (KWS) models with low resource has gained particular importance in recent years. Multi-speaker text-to-speech (TTS) been proved to be an effective data augmentation technique for KWS help complement inadequacies training data. However, previous works, system built TTS augmented couldn’t obtain considerable performance that trained real recordings as synthetic speeches could not fully represent...

10.1109/iscslp57327.2022.10038031 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022-12-11

Finger vein recognition is a biometric method utilizing the patterns inside one's fingers for personal identification. Recognition algorithm key part of finger system, dominating system performance. There are usually lot parameters in algorithms, and different values could lead to performance so that it essential set proper value each parameter practice. In this paper, we conduct experiments study how influence measured by equal error rate. We have made two observations from results: 1.When...

10.1109/isba.2017.7947697 article EN 2017-02-01

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition (ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...

10.1109/btas.2017.8272760 article EN 2017-10-01

Recent studies have shown that when state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are developed with out-domain data, the mismatch between development data and evaluation significantly degrades performance. An unsupervised cross-domain variation compensation (CDVC) approach to compensate domain is proposed. This based on assumption inter-domain variability an additive factor normal distribution in i-vector space. The effect of adaption...

10.1049/el.2015.1701 article EN Electronics Letters 2015-10-01

Noisy condition is an important extrinsic degradation affecting speaker verification system performance. A feature-recovery approach proposed to eliminate noise-dependent variability in feature space. frame of the noisy vector recovered using information itself and neighbour vectors. Experiments are conducted on test sets for text-dependent tasks results indicate that can achieve significant performance improvement by

10.1049/el.2015.1418 article EN Electronics Letters 2015-08-13

Unimodal biometric verification has developed a lot and become more accurate, but there is still not perfect algorithm. In the meantime, cases exist where unimodal system could meet requirements in practical use. It proved that algorithms with same overall accuracy may have different misclassified patterns. We make use of this complementation to fuse individual together for precise result. According our observation, algorithm confidence on its decisions seldom considered fusion methods. Our...

10.1145/3093293.3093310 article EN 2017-05-14

As the separated modeling methods are widely used in text-dependent speaker verification task. The reason why they so effective is discussed this paper. A word-based scoring method then proposed based on our discussion. Specifically, a segmentation algorithm firstly for segmenting enrollment and test utterances into words, automatically. Then every segment of utterance to enroll word model. Scoring done with each testing corresponding model same word. experiments carried out short duration...

10.1109/iscslp.2018.8706618 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018-11-01

Finger vein verification has developed a lot since its first proposal, but there is still not perfect algorithm. It proved that algorithms with the same overall accuracy may have different misclassified patterns. We could make use of this complementation to fuse individual together for more precise result. According our observation, algorithm confidence on decisions it seldom considered in fusion methods. Our work define decision reliability ratio quantify confidence, and then propose...

10.48550/arxiv.1612.05712 preprint EN other-oa arXiv (Cornell University) 2016-01-01

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...

10.48550/arxiv.1801.01262 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Many speaker recognition challenges have been held to assess the verification system in wild and probe performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on voxceleb, is most popular. Besides, another challenge called CN-Celeb (CNSRC) also this year, which Chinese celebrity multi-genre dataset CN-Celeb. This our team participated both closed tracks CNSRC 2022 VoxSRC 2022, achieved 1st place 3rd respectively. In reports, authors usually only provide a description of...

10.48550/arxiv.2211.00815 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...

10.48550/arxiv.2102.09817 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...