Georg Stemmer

ORCID: 0009-0008-9871-2423
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Speech and dialogue systems
  • Natural Language Processing Techniques
  • Topic Modeling
  • Voice and Speech Disorders
  • Phonetics and Phonology Research
  • Neural Networks and Applications
  • Advanced Data Compression Techniques
  • Robotics and Sensor-Based Localization
  • Advanced Image and Video Retrieval Techniques
  • Geographic Information Systems Studies
  • Gene expression and cancer classification
  • Industrial Vision Systems and Defect Detection
  • Robotics and Automated Systems
  • Welding Techniques and Residual Stresses
  • Emotion and Mood Recognition
  • Algorithms and Data Compression
  • Multimodal Machine Learning Applications
  • Digital Communication and Language
  • Time Series Analysis and Forecasting
  • Advanced Text Analysis Techniques
  • Face recognition and analysis
  • Sensor Technology and Measurement Systems

Intel (Germany)
2021

Intel (United States)
2015-2018

Intel (United Kingdom)
2017

Friedrich-Alexander-Universität Erlangen-Nürnberg
1999-2014

Siemens (Germany)
2006-2010

Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare
2006

70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one the earliest indicators PD. The issue this study automatically detect whether speech/voice a person by We employ acoustic features, prosodic features derived from two-mass model vocal folds on different kinds speech tests: sustained phonations, syllable repetitions, read texts monologues. Classification performed in either case SVMs. A correlation-based feature...

10.1109/asru.2011.6163978 article EN 2011-12-01

Adaptive training aims at reducing the influence of speaker, channel and environment variability on acoustic models. We describe an normalization approach to adaptive training. Phonetically irrelevant is reduced beginning procedure w.r.t. a set target The models can be HMMs or Gaussian mixture model (GMM). CMLLR applied normalize features. normalized data contains less unwanted used generate train recognition Employing GMM as leads text-independent that embedded into front-end. On broadcast...

10.1109/icassp.2005.1415284 article EN 2006-10-11

We describe a general-purpose end-to-end audio embeddings generator that can be easily adapted to various acoustic scene and event classification applications. In contrast many other models for classification, this does not require separate feature extraction step, but processes samples directly which simplifies its porting into hardware platforms. Our approach learns generic embedding representation is pre-trained on large dataset. It then fine-tuned via transfer learning with limited data...

10.1109/icassp39728.2021.9414229 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Young speakers are not represented adequately in current speech recognizers. In this paper we focus on the problem to adapt acoustic frontend of a recognizer which has been trained adults’ achieve better performance from children. We introduce and evaluate method perform non-linear VTLN by an unconstrained data-driven optimization filterbank. A second approach normalizes speaking rate young with PSOLA algorithm. Significant reductions word error have achieved.

10.21437/eurospeech.2003-415 article EN 2003-09-01

This paper focuses on the automatic recognition of a person’s age and gender based only his or her voice. Up to five different systems are compared combined in configurations: three model speaker’s characteristics feature spaces, i.e., MFCC, PLP, TRAPS, by Gaussian mixture models. The features these concatenated mean vectors. System number 4 uses physical two-mass vocal estimates data-driven optimization procedure 9 glottal from voiced speech sections. For each utterance minimum, maximum...

10.21437/interspeech.2010-748 article EN Interspeech 2022 2010-09-26

The paper deals with the development of acoustic models foreign words for a German speech recognizer. recognition quality is crucial overall performance system in application fields like spoken dialogue systems, when occur as proper names. One main problems modeling limitation training data, which must contain samples non-native pronunciation sounds. In order to obtain robust models, are still precise enough, we compare several methods map or merge phonemes, pronounced similar way by...

10.21437/eurospeech.2001-642 article EN 2001-09-03

We develop an acoustic feature set for the estimation of a person’s age from recorded speech signal. The baseline features are Mel-frequency cepstral coefficients (MFCCs) which extended by various prosodic features, pitch and formant frequencies. From experiments on University Florida Vocal Aging Database we can draw different conclusions. On one hand, adding prosodic, to MFCC leads relative reductions mean absolute error between 4-20%. Improvements even larger when perceptual labels taken...

10.21437/interspeech.2009-740 article EN Interspeech 2022 2009-09-06

Considering the dereverberation problem using multichannel processing, two main paradigms exist. The first paradigm utilizes long-term correlation of reverberant component for reducing it, e.g. Weighted Prediction Error (WPE) [1]. second paradigm, treats reverberation as a diffuse noise field, statically independent direct speech component, and aims to reduce it superdirective beamformer, [2]. Here we propose combine in two-stages algorithm. stage comprises WPE method, Minimum Variance...

10.1109/icassp.2017.7952195 article EN 2017-03-01

We introduce a new technique to improve the recognition of non-native speech. The underlying assumption is that for each pronunciation speech sound, there at least one sound in target language has similar native pronunciation. adaptation performed by HMM interpolation between adequate acoustic models. partners are determined automatically data-driven manner. Our experiments show this suitable both offline whole group speakers as well unsupervised online single speaker. Results given...

10.21437/interspeech.2004-11 article EN Interspeech 2022 2004-10-04

For many aspects of speech therapy an objective evaluation the intelligibility a patient's is needed. We investigate by means automatic recognition. Previous studies have shown that measures like word accuracy are consistent with human experts' ratings. To ease burden, it highly desirable to conduct assessment via phone. However, telephone channel influences quality signal which negatively affects results. reduce inaccuracies, we propose combination two recognizers. Experiments on sets...

10.1109/asru.2007.4430200 article EN 2007-01-01

In this work we explore the application of AI to robotic welding. Robotic welding is a widely used technology in many industries, but robots currently do not have capability detect defects which get introduced due various reasons process. We describe how deep-learning methods can be applied weld real-time by recording process with microphones and camera. Our findings are based on large database more than 4000 samples collected covers different types, materials defect categories. All deep...

10.48550/arxiv.2409.02290 preprint EN arXiv (Cornell University) 2024-09-03

The paper investigates the integration of heteroscedastic linear discriminant analysis (HLDA) into adaptively trained speech recognizers. Two different approaches are compared: first is a variant CMLLR-SAT, second based on our previously introduced method constrained maximum-likelihood speaker normalization (CMLSN). For latter both HLDA projection and speaker-specific transformations for estimated w.r.t. set simple target-models. It investigated if additional robustness can be achieved by...

10.1109/icassp.2006.1660238 article EN 2006-08-02

The degree of sleepiness in the Sleepy Language Corpus from Interspeech 2011 Speaker State Challenge is predicted with regression and a very large feature vector. Most notable great gender difference which can mainly be attributed to females showing their less than males do.

10.1109/icassp.2014.6853746 article EN 2014-05-01

Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- second-order derivatives. The derivatives normally approximated by fitting a linear regression line to fixed-length segment of consecutive frames. time resolution smoothness the estimated derivative depends length segment. We present an approach improve representation dynamics, which is combination multiple resolutions. resulting feature vector transformed reduce its dimension correlation...

10.1109/asru.2001.1034583 article EN 2005-08-24

The problem of the effect accent on performance Automatic Speech Recognition (ASR) systems is well known. In this paper, we study variability Indian English ASR task. We evaluate test vocabularies HMMs trained (a) Accent specific training data (b) pooled which combines all (c) reduced size matching data. demonstrate that set performs best phonetically rich isolated word recognition But perform better than HMMs, indicating a possible approach using first stage identification to choose correct...

10.1109/slt.2008.4777881 article EN 2008-12-01
Coming Soon ...