- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Biometric Identification and Security
- Dermatoglyphics and Human Traits
- Natural Language Processing Techniques
- Forensic Fingerprint Detection Methods
- Vibration Control and Rheological Fluids
- Wind and Air Flow Studies
- Scientific Computing and Data Management
- Handwritten Text Recognition Techniques
- Graphite, nuclear technology, radiation studies
- Digital Games and Media
- Forensic and Genetic Research
- Phonetics and Phonology Research
- User Authentication and Security Systems
- Digital Media Forensic Detection
- Artificial Intelligence in Games
- Reinforcement Learning in Robotics
- Speech and dialogue systems
- Structural Engineering and Vibration Analysis
Shanghai Jiao Tong University
2021-2022
Chinese Academy of Sciences
2013-2018
Institute of Acoustics
2015-2018
Peking University
2017
Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...
Finger vein verification is using patterns to verify a person's identity, which widely used in various fields. In practice, the method for most important part of biometric system, determines reliability system. this paper, we propose methods called DeepVein finger based on deep convolutional neural networks and conduct experiments evaluate our methods. The experimental results show that proposed can achieve state-of-the-art performance accuracy. addition, present how amount data training...
The variety and complexity of accents pose a huge challenge to robust Automatic Speech Recognition (ASR). Some previous work has attempted address such problems, however most the current approaches either require prior knowledge about target accent, or cannot handle unseen accent-unspecific standard speech. In this work, we aim improve multi-accent speech recognition in end-to-end (E2E) framework with novel layer-wise adaptation architecture. Firstly, propose deep accent...
Tuned liquid dampers (TLDs) have received considerable attention as effective passive dynamic vibration absorbers for controlling wind-induced vibrations in high-rise buildings. However, due to the complex coupled response of structure–TLD system, accurately evaluating damping performance TLD is challenging. This study proposes a method evaluate through response. First, state space model system built. Second, technique used determine matrix and obtain modal frequencies, ratios, mass ratios...
This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...
Recently, speaker embeddings extracted from a discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, DNN classifier is trained using cross entropy loss with softmax. However, this kind of function does not explicitly encourage inter-class separability and intra-class compactness. As result, are optimal for recognition tasks. paper, to address issue, three different margin based losses which only separate classes but...
This paper presents Botzone, a competitive and interactive platform for game AI education, aiming to simplify the teaching process of courses inspire self-study learners. Botzone is universal online platform, designed evaluate different implementations by applying them agents in variety games compete with each other. It has been successfully used various competition practice, expandability support more languages, as well further usages such researching machine learning on AI. In this paper,...
Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...
This study introduces a linear Gaussian model-based framework for voice biometrics. The model works with discrete-time dynamical systems. motivation is to use the modelling method in biometrics, and show that accuracy offered by comparable other state-of-the-art methods such as Probabilistic Linear Discriminant Analysis two-covariance model. An expectation–maximisation algorithm derived train Bayesian solution used calculate log-likelihood ratio score of all trials speakers. approach...
Although the state‐of‐the‐art i‐vector‐based probabilistic linear discriminant analysis systems resulted in promising performances National Institute of Standards and Technology speaker recognition evaluations, impact domain mismatch when system development data evaluation are collected from different sources remains a challenging problem. This issue was focus Johns Hopkins University 2013 workshop where adaptation challenge (DAC13) corpus created to address it. The cross‐domain variation...
This paper describes the AISpeech-SJTU system for accent identification track of Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented data collected from 8 countries and auxiliary Librispeech dataset are provided training. To build an accurate robust system, we explore whole pipeline in detail. First, introduce ASR based phone posteriorgram (PPG) feature to verify its efficacy. Then, a novel TTS approach is carefully designed augment...
This report describes the SJTU-AISPEECH system for Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, online and offline system. Different ResNet-based backbones loss functions are explored. Our final fusion achieved 3rd place in track1. track3, statistic adaptation jointly training based domain adaptation. In adaptation, trained source target dataset with different objectives to do We explored data, self-supervised learning angular proto-typical...
With the rise of intelligent speech processing applications, quickly producing keyword spotting (KWS) models with low resource has gained particular importance in recent years. Multi-speaker text-to-speech (TTS) been proved to be an effective data augmentation technique for KWS help complement inadequacies training data. However, previous works, system built TTS augmented couldn’t obtain considerable performance that trained real recordings as synthetic speeches could not fully represent...
Finger vein recognition is a biometric method utilizing the patterns inside one's fingers for personal identification. Recognition algorithm key part of finger system, dominating system performance. There are usually lot parameters in algorithms, and different values could lead to performance so that it essential set proper value each parameter practice. In this paper, we conduct experiments study how influence measured by equal error rate. We have made two observations from results: 1.When...
In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition (ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...
Recent studies have shown that when state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are developed with out-domain data, the mismatch between development data and evaluation significantly degrades performance. An unsupervised cross-domain variation compensation (CDVC) approach to compensate domain is proposed. This based on assumption inter-domain variability an additive factor normal distribution in i-vector space. The effect of adaption...
Noisy condition is an important extrinsic degradation affecting speaker verification system performance. A feature-recovery approach proposed to eliminate noise-dependent variability in feature space. frame of the noisy vector recovered using information itself and neighbour vectors. Experiments are conducted on test sets for text-dependent tasks results indicate that can achieve significant performance improvement by
Unimodal biometric verification has developed a lot and become more accurate, but there is still not perfect algorithm. In the meantime, cases exist where unimodal system could meet requirements in practical use. It proved that algorithms with same overall accuracy may have different misclassified patterns. We make use of this complementation to fuse individual together for precise result. According our observation, algorithm confidence on its decisions seldom considered fusion methods. Our...
As the separated modeling methods are widely used in text-dependent speaker verification task. The reason why they so effective is discussed this paper. A word-based scoring method then proposed based on our discussion. Specifically, a segmentation algorithm firstly for segmenting enrollment and test utterances into words, automatically. Then every segment of utterance to enroll word model. Scoring done with each testing corresponding model same word. experiments carried out short duration...
Finger vein verification has developed a lot since its first proposal, but there is still not perfect algorithm. It proved that algorithms with the same overall accuracy may have different misclassified patterns. We could make use of this complementation to fuse individual together for more precise result. According our observation, algorithm confidence on decisions it seldom considered in fusion methods. Our work define decision reliability ratio quantify confidence, and then propose...
In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of algorithms heavily depends on large-scale data sets. order motivate research recognition, we released the largest set up now hold competitions based our every year. 2017, International Competition Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered 10 them joined final evaluation. winner this year dramatically...
Many speaker recognition challenges have been held to assess the verification system in wild and probe performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on voxceleb, is most popular. Besides, another challenge called CN-Celeb (CNSRC) also this year, which Chinese celebrity multi-genre dataset CN-Celeb. This our team participated both closed tracks CNSRC 2022 VoxSRC 2022, achieved 1st place 3rd respectively. In reports, authors usually only provide a description of...
Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data methods usually focus on the diversity of acoustic environment, leaving lexicon variation neglected. For text dependent tasks, it's well-known that preparing training with target transcript most effectual approach well-performing however collecting such time-consuming and expensive. In this work, we propose unit selection synthesis based...