NFDI4DS | UHH-SEMS - Publication Details

Georg Stemmer

ORCID: 0009-0008-9871-2423

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5088131608

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Speech and dialogue systems
Natural Language Processing Techniques
Topic Modeling
Voice and Speech Disorders
Phonetics and Phonology Research
Neural Networks and Applications
Advanced Data Compression Techniques
Robotics and Sensor-Based Localization
Advanced Image and Video Retrieval Techniques
Geographic Information Systems Studies
Gene expression and cancer classification
Industrial Vision Systems and Defect Detection
Robotics and Automated Systems
Welding Techniques and Residual Stresses
Emotion and Mood Recognition
Algorithms and Data Compression
Multimodal Machine Learning Applications
Digital Communication and Language
Time Series Analysis and Forecasting
Advanced Text Analysis Techniques
Face recognition and analysis
Sensor Technology and Measurement Systems

Intel (Germany)
2021

Intel (United States)
2015-2018

Intel (United Kingdom)
2017

Friedrich-Alexander-Universität Erlangen-Nürnberg
1999-2014

Siemens (Germany)
2006-2010

Istituto Centrale per la Ricerca Scientifica e Tecnologica Applicata al Mare
2006

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

OPENALEX - Publications

Tobias Bocklet Elmar Nöth Georg Stemmer Hana Růžičková Jan Rusz

70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one the earliest indicators PD. The issue this study automatically detect whether speech/voice a person by We employ acoustic features, prosodic features derived from two-mass model vocal folds on different kinds speech tests: sustained phonations, syllable repetitions, read texts monologues. Classification performed in either case SVMs. A correlation-based feature...

10.1109/asru.2011.6163978 article EN 2011-12-01

Revising Perceptual Linear Prediction (PLP)

OPENALEX - Publications

Florian Hönig Georg Stemmer Christian Hacker Fabio Brugnara

10.21437/interspeech.2005-138 article EN Interspeech 2022 2005-09-04

Adaptive Training Using Simple Target Models

OPENALEX - Publications

Georg Stemmer Fabio Brugnara Diego Giuliani

Adaptive training aims at reducing the influence of speaker, channel and environment variability on acoustic models. We describe an normalization approach to adaptive training. Phonetically irrelevant is reduced beginning procedure w.r.t. a set target The models can be HMMs or Gaussian mixture model (GMM). CMLLR applied normalize features. normalized data contains less unwanted used generate train recognition Employing GMM as leads text-independent that embedded into front-end. On broadcast...

10.1109/icassp.2005.1415284 article EN 2006-10-11

Efficient End-to-End Audio Embeddings Generation for Audio Classification on Target Applications

OPENALEX - Publications

Paulo Lopez‐Meyer Juan A. del Hoyo Ontiveros Hong Lu Georg Stemmer

We describe a general-purpose end-to-end audio embeddings generator that can be easily adapted to various acoustic scene and event classification applications. In contrast many other models for classification, this does not require separate feature extraction step, but processes samples directly which simplifies its porting into hardware platforms. Our approach learns generic embedding representation is pre-trained on large dataset. It then fine-tuned via transfer learning with limited data...

10.1109/icassp39728.2021.9414229 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Acoustic normalization of children's speech

OPENALEX - Publications

Georg Stemmer Christian Hacker Stefan Steidl Elmar Nöth

Young speakers are not represented adequately in current speech recognizers. In this paper we focus on the problem to adapt acoustic frontend of a recognizer which has been trained adults’ achieve better performance from children. We introduce and evaluate method perform non-linear VTLN by an unconstrained data-driven optimization filterbank. A second approach normalizes speaking rate young with PSOLA algorithm. Significant reductions word error have achieved.

10.21437/eurospeech.2003-415 article EN 2003-09-01

Age and gender recognition based on multiple systems - early vs. late fusion

OPENALEX - Publications

Tobias Bocklet Georg Stemmer Viktor Zeißler Elmar Nöth

This paper focuses on the automatic recognition of a person’s age and gender based only his or her voice. Up to five different systems are compared combined in configurations: three model speaker’s characteristics feature spaces, i.e., MFCC, PLP, TRAPS, by Gaussian mixture models. The features these concatenated mean vectors. System number 4 uses physical two-mass vocal estimates data-driven optimization procedure 9 glottal from voiced speech sections. For each utterance minimum, maximum...

10.21437/interspeech.2010-748 article EN Interspeech 2022 2010-09-26

Acoustic modeling of foreign words in a German speech recognition system

OPENALEX - Publications

Georg Stemmer Elmar Nöth Heinrich Niemann

The paper deals with the development of acoustic models foreign words for a German speech recognizer. recognition quality is crucial overall performance system in application fields like spoken dialogue systems, when occur as proper names. One main problems modeling limitation training data, which must contain samples non-native pronunciation sounds. In order to obtain robust models, are still precise enough, we compare several methods map or merge phonemes, pronounced similar way by...

10.21437/eurospeech.2001-642 article EN 2001-09-03

Analyzing features for automatic age estimation on cross-sectional data

OPENALEX - Publications

Werner Spiegl Georg Stemmer Eva Lasarcyk Varada Kolhatkar Andrew Cassidy and 7 more

We develop an acoustic feature set for the estimation of a person’s age from recorded speech signal. The baseline features are Mel-frequency cepstral coefficients (MFCCs) which extended by various prosodic features, pitch and formant frequencies. From experiments on University Florida Vocal Aging Database we can draw different conclusions. On one hand, adding prosodic, to MFCC leads relative reductions mean absolute error between 4-20%. Improvements even larger when perceptual labels taken...

10.21437/interspeech.2009-740 article EN Interspeech 2022 2009-09-06

Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis

OPENALEX - Publications

Colin W. Wightman Ann K. Syrdal Georg Stemmer Alistair Conkie Mark C. Beutnagel

10.21437/icslp.2000-211 article EN 4th International Conference on Spoken Language Processing (ICSLP 1996) 2000-10-16

Combined Weighted Prediction Error and Minimum Variance Distortionless Response for dereverberation

OPENALEX - Publications

Alejandro Cohen Georg Stemmer S. Ingalsuo Shmulik Markovich‐Golan

Considering the dereverberation problem using multichannel processing, two main paradigms exist. The first paradigm utilizes long-term correlation of reverberant component for reducing it, e.g. Weighted Prediction Error (WPE) [1]. second paradigm, treats reverberation as a diffuse noise field, statically independent direct speech component, and aims to reduce it superdirective beamformer, [2]. Here we propose combine in two-stages algorithm. stage comprises WPE method, Minimum Variance...

10.1109/icassp.2017.7952195 article EN 2017-03-01

STOCHASTIC SEGMENT MODELS OF EUKARYOTIC PROMOTER REGIONS

OPENALEX - Publications

Uwe Ohler Georg Stemmer Stefan Harbeck Heinrich Niemann

10.1142/9789814447331_0036 article EN Biocomputing 1999-12-01

Adaptation in the pronunciation space for non-native speech recognition

OPENALEX - Publications

Georg Stemmer Stefan Steidl Christian Hacker Elmar Nöth

We introduce a new technique to improve the recognition of non-native speech. The underlying assumption is that for each pronunciation speech sound, there at least one sound in target language has similar native pronunciation. adaptation performed by HMM interpolation between adequate acoustic models. partners are determined automatically data-driven manner. Our experiments show this suitable both offline whole group speakers as well unsupervised online single speaker. Results given...

10.21437/interspeech.2004-11 article EN Interspeech 2022 2004-10-04

Towards robust automatic evaluation of pathologic telephone speech

OPENALEX - Publications

Korbinian Riedhammer Georg Stemmer Tino Haderlein Maria Schuster F. Rosanowski and 2 more

For many aspects of speech therapy an objective evaluation the intelligibility a patient's is needed. We investigate by means automatic recognition. Previous studies have shown that measures like word accuracy are consistent with human experts' ratings. To ease burden, it highly desirable to conduct assessment via phone. However, telephone channel influences quality signal which negatively affects results. reduce inaccuracies, we propose combination two recognizers. Experiments on sets...

10.1109/asru.2007.4430200 article EN 2007-01-01

Unsupervised Welding Defect Detection Using Audio And Video

OPENALEX - Publications

Georg Stemmer Jose A. Lopez Juan A. del Hoyo Ontiveros Arvind Raju Tara Thimmanaik and 1 more

In this work we explore the application of AI to robotic welding. Robotic welding is a widely used technology in many industries, but robots currently do not have capability detect defects which get introduced due various reasons process. We describe how deep-learning methods can be applied weld real-time by recording process with microphones and camera. Our findings are based on large database more than 4000 samples collected covers different types, materials defect categories. All deep...

10.48550/arxiv.2409.02290 preprint EN arXiv (Cornell University) 2024-09-03

Integration of Heteroscedastic Linear Discriminant Analysis (HLDA) Into Adaptive Training

OPENALEX - Publications

Georg Stemmer Fabio Brugnara

The paper investigates the integration of heteroscedastic linear discriminant analysis (HLDA) into adaptively trained speech recognizers. Two different approaches are compared: first is a variant CMLLR-SAT, second based on our previously introduced method constrained maximum-likelihood speaker normalization (CMLSN). For latter both HLDA projection and speaker-specific transformations for estimated w.r.t. set simple target-models. It investigated if additional robustness can be achieved by...

10.1109/icassp.2006.1660238 article EN 2006-08-02

MOBSY: Integration of vision and dialogue in service robots

OPENALEX - Publications

Matthias Zobel Joachim Denzler Benno Heigl Elmar N�th Dietrich Paulus and 2 more

10.1007/s00138-002-0092-z article EN Machine Vision and Applications 2003-04-01

Are men more sleepy than women or does it only look like — Automatic analysis of sleepy speech

OPENALEX - Publications

Florian Hönig Anton Batliner Tobias Booklet Georg Stemmer Elmar Nöth and 2 more

The degree of sleepiness in the Sleepy Language Corpus from Interspeech 2011 Speaker State Challenge is predicted with regression and a very large feature vector. Most notable great gender difference which can mainly be attributed to females showing their less than males do.

10.1109/icassp.2014.6853746 article EN 2014-05-01

Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients

OPENALEX - Publications

Georg Stemmer Christian Hacker Elmar Nöth H. Niemann

Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- second-order derivatives. The derivatives normally approximated by fitting a linear regression line to fixed-length segment of consecutive frames. time resolution smoothness the estimated derivative depends length segment. We present an approach improve representation dynamics, which is combination multiple resolutions. resulting feature vector transformed reduce its dimension correlation...

10.1109/asru.2001.1034583 article EN 2005-08-24

Accented Indian english ASR: Some early results

OPENALEX - Publications

Kaustubh Kulkarni Sohini Sengupta V. Ramasubramanian Josef G. Bauer Georg Stemmer

The problem of the effect accent on performance Automatic Speech Recognition (ASR) systems is well known. In this paper, we study variability Indian English ASR task. We evaluate test vocabularies HMMs trained (a) Accent specific training data (b) pooled which combines all (c) reduced size matching data. demonstrate that set performs best phonetically rich isolated word recognition But perform better than HMMs, indicating a possible approach using first stage identification to choose correct...

10.1109/slt.2008.4777881 article EN 2008-12-01

Coming Soon ...