- Speech Recognition and Synthesis
- Speech and Audio Processing
- Voice and Speech Disorders
- Music and Audio Processing
- Phonetics and Phonology Research
- Cleft Lip and Palate Research
- Natural Language Processing Techniques
- Stuttering Research and Treatment
- Topic Modeling
- Head and Neck Cancer Studies
- Language Development and Disorders
- Text Readability and Simplification
- Industrial Vision Systems and Defect Detection
- Emotion and Mood Recognition
- Speech and dialogue systems
- Fault Detection and Control Systems
- Infection Control and Ventilation
- Advanced Data Compression Techniques
- Dysphagia Assessment and Management
- Advanced Surface Polishing Techniques
- Context-Aware Activity Recognition Systems
- Sentiment Analysis and Opinion Mining
- Sparse and Compressive Sensing Techniques
- Non-Destructive Testing Techniques
- Embedded Systems Design Techniques
Georg Simon Ohm University of Applied Sciences Nuremberg
2020-2025
Intel (United States)
2017-2023
Friedrich-Alexander-Universität Erlangen-Nürnberg
2007-2022
Intel (Germany)
2016-2018
Universitätsklinikum Erlangen
2010-2012
SRI International
2009
Menlo School
2009
The INTERSPEECH 2012 Speaker Trait Challenge provides for the first time a unified test-bed 'perceived' speaker traits: Personality in five OCEAN personality dimensions, likability of speakers, and intelligibility pathologic speakers.In this paper, we describe these three Sub-Challenges, conditions, baselines, new feature set by openSMILE toolkit, provided to participants.
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) universal background (UBMs), which is well known for the task speaker identification/verification. training performed by EM algorithm or MAP adaptation respectively. For second each test set a GMM model trained. means extracted concatenated, results in supervector speaker. These supervectors then used support vector machine (SVM). Three different...
70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one the earliest indicators PD. The issue this study automatically detect whether speech/voice a person by We employ acoustic features, prosodic features derived from two-mass model vocal folds on different kinds speech tests: sustained phonations, syllable repetitions, read texts monologues. Classification performed in either case SVMs. A correlation-based feature...
Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time manpower. Hence, there a need for an easy to apply reliable automatic method. To create reference system, data 58 CLP were perceptually by experienced therapists characteristic phonetic disorders at the phoneme level. The first part article aims detect such...
Articulation and phonation is affected in 70 % to 90 of patients with Parkinson’s disease (PD). This study focuses on the question whether speech carries information about 1. PD being present at a speaker or not, 2. estimating severity (if present). We first perform classification experiments focusing automatic detection as 2-class problem (PD vs. healthy speakers). The described 3-class task based Unified Disease Rating Scale (UPDRS) ratings. employ acoustic, prosodic glottal features...
The SRI speaker recognition system for the 2008 NIST evaluation (SRE) incorporates a variety of models and features, both cepstral stylistic. We highlight improvements made to specific subsystems analyze performance various subsystem combinations in different data conditions. show importance language nativeness conditioning, as well role ASR verification.
Information from different bio-signals such as speech, handwriting, and gait have been used to monitor the state of Parkinson's disease (PD) patients, however, all multimodal may not always be available. We propose a method based on multi-view representation learning via generalized canonical correlation analysis (GCCA) for features extracted handwriting that can complement speech-based features. Three problems are addressed: classification PD patients vs. healthy controls, prediction...
Tooth loss and its prosthetic rehabilitation significantly affect speech intelligibility. However, little is known about the influence of deficiencies on oral health-related quality life (OHRQoL). The aim this study was to investigate whether intelligibility enhancement through influences OHRQoL in patients wearing complete maxillary dentures. Speech by means an automatic recognition system (ASR) prospectively evaluated compared with subjectively assessed Oral Health Impact Profile (OHIP)...
Self-supervised learning has been successfully used for various speech related tasks, including automatic recognition. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ) achieved state-of-the-art results in In this work, we further optimize the BEST-RQ approach using Kullback-Leibler divergence as an additional regularizing loss and multi-codebook extension per cluster derived from low-level feature clustering. Preliminary experiments on train-100 split of LibriSpeech...
We present an approach to Audio-Visual Speech Recognition that builds on a pre-trained Whisper model. To infuse visual information into this audio-only model, we extend it with AV fusion module and LoRa adapters, one of the most up-to-date adapter approaches. One advantage adapter-based approaches, is only relatively small number parameters are trained, while basic model remains unchanged. Common AVSR approaches train single models handle several noise categories levels simultaneously....
Deploying large language models (LLMs) in real-world applications requires robust safety guard to detect and block harmful user prompts. While achieve strong performance, their computational cost is substantial. To mitigate this, smaller distilled are used, but they often underperform on "hard" examples where the larger model provides accurate predictions. We observe that many inputs can be reliably handled by model, while only a small fraction require model's capacity. Motivated we propose...
Summary Dental rehabilitation of edentulous patients with complete dentures includes not only aesthetics and mastication food, but also speech quality. It was the aim this study to introduce validate a computer‐based recognition system (ASR) for automatic assessment in after dental dentures. To examine impact on production, outcome without compared. Twenty‐eight reading standardized text were recorded twice – their situ . A control group 40 healthy subjects natural dentition under same...
This paper focuses on the automatic recognition of a person’s age and gender based only his or her voice. Up to five different systems are compared combined in configurations: three model speaker’s characteristics feature spaces, i.e., MFCC, PLP, TRAPS, by Gaussian mixture models. The features these concatenated mean vectors. System number 4 uses physical two-mass vocal estimates data-driven optimization procedure 9 glottal from voiced speech sections. For each utterance minimum, maximum...
We describe a new GMM-UBM speaker recognition system that uses standard cepstral features, but selects different frames of speech for subsystems. Subsystems, or ldquoconstraintsrdquo, are based on syllable-level information and combined at the score level. Results both NIST 2006 2008 test data sets English telephone train condition reveal set eight constraints performs extremely well, resulting in better performance than other commonly-used models. Given still largely-unexplored world...
Intelligibility is widely used to measure the severity of articulatory problems in pathological speech.Recently, a number automatic intelligibility assessment tools have been developed.Most them use speech recognizers (ASR) compare patient's utterance with target text.These methods are bound one language and tend be less accurate when speakers hesitate or make reading errors.To circumvent these problems, two different ASR-free were developed over last few years, only making acoustic...