- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Voice and Speech Disorders
- Phonetics and Phonology Research
- Advanced Data Compression Techniques
- Neural Networks and Applications
- Digital Media Forensic Detection
- Emotion and Mood Recognition
- Gaussian Processes and Bayesian Inference
- Anomaly Detection Techniques and Applications
- Educational Reforms and Innovations
- Blind Source Separation Techniques
- Artificial Intelligence in Games
- Dementia and Cognitive Impairment Research
- Categorization, perception, and language
- Identification and Quantification in Food
- Time Series Analysis and Forecasting
- Brain Tumor Detection and Classification
- Gaze Tracking and Assistive Technology
- Advanced Text Analysis Techniques
- Neural and Behavioral Psychology Studies
- Machine Learning in Healthcare
- Spectroscopy and Chemometric Analyses
- Dysphagia Assessment and Management
Institute of Software
2022-2025
Chinese Academy of Sciences
2021-2025
Human Computer Interaction (Switzerland)
2023
Shenzhen Institutes of Advanced Technology
2014-2022
Synergy University
2022
Chinese University of Hong Kong
2020-2022
Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with disorders, often compounded co-occurring physical disabilities, lead to the difficulty in collecting large quantities required for system development. This paper investigates set data augmentation techniques disordered recognition, including vocal tract length perturbation (VTLP), tempo and speed perturbation. Both normal were exploited process. Variability among impaired speakers...
Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application dysarthric and elderly via data-intensive parameter fine-tuning is confronted by in-domain data scarcity mismatch. To this end, paper explores series approaches integrate domain fine-tuned SSL pre-trained features into TDNN Conformer systems for recognition. These include: a) input feature fusion between standard acoustic frontends representations; b)...
In light of the growing proportion older individuals in our society, timely diagnosis Alzheimer's disease has become a crucial aspect healthcare. this paper, we propose non-invasive and cost-effective detection method based on speech technology. The employs pre-trained language model conjunction with techniques such as prompt fine-tuning conditional learning, thereby enhancing accuracy efficiency process. To address issue limited computational resources, study efficient LORA to construct...
Discrete tokens extracted provide efficient and domain adaptable speech features. Their application to disordered that exhibits articulation imprecision large mismatch against normal voice remains unexplored. To improve their phonetic discrimination is weakened during unsupervised K-means or vector quantization of continuous features, this paper proposes novel phone-purity guided (PPG) discrete for dysarthric recognition. Phonetic label supervision used regularize maximum likelihood...
Speaker adaptation techniques play a key role in reducing the mismatch between speech recognition systems and target users. In order to robustly learn speaker-dependent parameters, model based DNN often require significant amount of data. For example, commonly used learning hidden unit contributions (LHUC) adaptation, high-dimensional layer output scaling vectors are used. When limited data available, standard L-HUC is prone over-fitting poor generalization. To address issue, Bayesian...
Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal in recent decades, accurate dysarthric and elderly remains highly challenging tasks to date. Sources heterogeneity commonly found including accent or gender, when further compounded with variability over age pathology severity level, create large diversity among speakers. To this end, speaker adaptation techniques play a key role personalization ASR systems for such users. Motivated by...
Automatic recognition of disordered and elderly speech remains a highly challenging task to date due the difficulty in collecting such data large quantities. This paper explores series approaches integrate domain adapted Self-Supervised Learning (SSL) pre-trained models into TDNN Conformer ASR systems for dysarthric recognition: a) input feature fusion between standard acoustic frontends wav2vec2.0 representations; b) frame-level joint decoding separately trained using features alone with...
Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead the difficulty in collecting large quantities impaired required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized augmentation approaches that simultaneously learn encode, generate and discriminate synthesized speech....
Challenge is the core element of digital games. The wide spectrum physical, cognitive, and emotional challenge experiences provided by modern games can be evaluated subjectively using a questionnaire, CORGIS, which allows for post hoc evaluation overall experience that occurred during game play. Measuring this dynamically objectively, however, would allow more holistic view moment-to-moment players. This study, therefore, explored potential detecting perceived from physiological signals. For...
Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker data intensive end-to-end ASR is hindered by the scarcity speaker-level and performance sensitivity transcription errors. To address these issues, set compact efficient speaker-dependent (SD) parameter representations are used facilitate both adaptive training test-time state-of-the-art Conformer...
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that often attributable speaker differences. Speaker adaptation techniques play a vital role mismatch. Model-based approaches require sufficient amounts of target ensure robustness. When amount level limited, prone overfitting poor generalization. To address issue, this paper proposes full Bayesian learning based DNN framework model speaker-dependent (SD) parameter uncertainty given...
A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability.In this paper, compact dependent learning hidden unit contributions (LHUC) are used facilitate both adaptive training (SAT) and test time unsupervised adaptation stateof-the-art Conformer based end-to-end ASR systems.The sensitivity during supervision error rate reduced using confidence score selection of more "trustworthy" subset specific data.A estimation module smooth over-confident...
Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application disordered is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes parallel acoustic-articulatory of 15-hour TORGO corpus model training before...
Automatic recognition of disordered speech remains a highly challenging task to date due data scarcity. This paper presents reinforcement learning (RL) based on-the-fly augmentation approach for training state-of-the-art PyChain TDNN and end-to-end Conformer ASR systems on such data. The handcrafted temporal spectral mask operations in the standard SpecAugment method that are system dependent, together with additionally introduced minimum maximum cut-offs these masks, now automatically...
Recently deep neural networks (DNNs) have become increasingly popular for acoustic modelling in automatic speech recognition (ASR) systems. As the bottleneck features they produce are inherently discriminative and contain rich hidden factors that influence surface realization, standard approach is to augment conventional with a tandem framework. In this paper, an alternative incorporate investigated. The complex relationship between DNN modelled using generalized variable parameter HMMs...