NFDI4DS | UHH-SEMS - Publication Details

Tobias Bocklet

ORCID: 0009-0008-7780-8821

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5033302750

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Voice and Speech Disorders
Music and Audio Processing
Phonetics and Phonology Research
Cleft Lip and Palate Research
Natural Language Processing Techniques
Stuttering Research and Treatment
Topic Modeling
Head and Neck Cancer Studies
Language Development and Disorders
Text Readability and Simplification
Industrial Vision Systems and Defect Detection
Emotion and Mood Recognition
Speech and dialogue systems
Fault Detection and Control Systems
Infection Control and Ventilation
Advanced Data Compression Techniques
Dysphagia Assessment and Management
Advanced Surface Polishing Techniques
Context-Aware Activity Recognition Systems
Sentiment Analysis and Opinion Mining
Sparse and Compressive Sensing Techniques
Non-Destructive Testing Techniques
Embedded Systems Design Techniques

Georg Simon Ohm University of Applied Sciences Nuremberg
2020-2025

Intel (United States)
2017-2023

Friedrich-Alexander-Universität Erlangen-Nürnberg
2007-2022

Intel (Germany)
2016-2018

Universitätsklinikum Erlangen
2010-2012

SRI International
2009

Menlo School
2009

The INTERSPEECH 2012 speaker trait challenge

OPENALEX - Publications

Björn W. Schuller Stefan Steidl Anton Batliner Elmar Nöth Alessandro Vinciarelli and 7 more

The INTERSPEECH 2012 Speaker Trait Challenge provides for the first time a unified test-bed 'perceived' speaker traits: Personality in five OCEAN personality dimensions, likability of speakers, and intelligibility pathologic speakers.In this paper, we describe these three Sub-Challenges, conditions, baselines, new feature set by openSMILE toolkit, provided to participants.

10.21437/interspeech.2012-86 article EN Interspeech 2022 2012-09-09

NeuroSpeech: An open-source software for Parkinson's speech analysis

OPENALEX - Publications

Juan Rafael Orozco‐Arroyave Juan Camilo Vásquez-Correa J. F. Vargas‐Bonilla Raman Arora Najim Dehak and 11 more

10.1016/j.dsp.2017.07.004 article EN Digital Signal Processing 2017-07-17

Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease

OPENALEX - Publications

Juan Camilo Vásquez-Correa Juan Rafael Orozco‐Arroyave Tobias Bocklet Elmar Nöth

10.1016/j.jcomdis.2018.08.002 article EN Journal of Communication Disorders 2018-08-20

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

OPENALEX - Publications

Tobias Bocklet Andreas Maier Josef G. Bauer Felix Burkhardt Elmar Nöth

This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) universal background (UBMs), which is well known for the task speaker identification/verification. training performed by EM algorithm or MAP adaptation respectively. For second each test set a GMM model trained. means extracted concatenated, results in supervector speaker. These supervectors then used support vector machine (SVM). Three different...

10.1109/icassp.2008.4517932 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

OPENALEX - Publications

Tobias Bocklet Elmar Nöth Georg Stemmer Hana Růžičková Jan Rusz

70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one the earliest indicators PD. The issue this study automatically detect whether speech/voice a person by We employ acoustic features, prosodic features derived from two-mass model vocal folds on different kinds speech tests: sustained phonations, syllable repetitions, read texts monologues. Classification performed in either case SVMs. A correlation-based feature...

10.1109/asru.2011.6163978 article EN 2011-12-01

A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge

OPENALEX - Publications

Björn W. Schuller Stefan Steidl Anton Batliner Elmar Nöth Alessandro Vinciarelli and 7 more

10.1016/j.csl.2014.08.003 article EN Computer Speech & Language 2014-08-27

Automatic detection of articulation disorders in children with cleft lip and palate

OPENALEX - Publications

Andreas Maier Florian Hönig Tobias Bocklet Elmar Nöth Florian Stelzle and 2 more

Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time manpower. Hence, there a need for an easy to apply reliable automatic method. To create reference system, data 58 CLP were perceptually by experienced therapists characteristic phonetic disorders at the phoneme level. The first part article aims detect such...

10.1121/1.3216913 article EN The Journal of the Acoustical Society of America 2009-11-01

Automatic evaluation of parkinson's speech — acoustic, prosodic and voice related cues

OPENALEX - Publications

Tobias Bocklet Stefan Steidl Elmar Nöth Sabine Skodda

Articulation and phonation is affected in 70 % to 90 of patients with Parkinson’s disease (PD). This study focuses on the question whether speech carries information about 1. PD being present at a speaker or not, 2. estimating severity (if present). We first perform classification experiments focusing automatic detection as 2-class problem (PD vs. healthy speakers). The described 3-class task based Unified Disease Rating Scale (UPDRS) ratings. employ acoustic, prosodic glottal features...

10.21437/interspeech.2013-313 article EN Interspeech 2022 2013-08-25

Detecting Vocal Fatigue with Neural Embeddings

OPENALEX - Publications

Sebastian P. Bayerl Dominik Wagner Ilja Baumann Tobias Bocklet Korbinian Riedhammer

10.1016/j.jvoice.2023.01.012 article EN Journal of Voice 2023-02-01

THE SRI NIST 2008 speaker recognition evaluation system

OPENALEX - Publications

Sachin Kajarekar Nicolas Scheffer Martin Graciarena Elizabeth Shriberg Andreas Stolcke and 2 more

The SRI speaker recognition system for the 2008 NIST evaluation (SRE) incorporates a variety of models and features, both cepstral stylistic. We highlight improvements made to specific subsystems analyze performance various subsystem combinations in different data conditions. show importance language nativeness conditioning, as well role ASR verification.

10.1109/icassp.2009.4960556 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

Multi-view representation learning via gcca for multimodal analysis of Parkinson's disease

OPENALEX - Publications

Juan Camilo Vásquez-Correa Juan Rafael Orozco‐Arroyave Raman Arora Elmar Nöth Najim Dehak and 10 more

Information from different bio-signals such as speech, handwriting, and gait have been used to monitor the state of Parkinson's disease (PD) patients, however, all multimodal may not always be available. We propose a method based on multi-view representation learning via generalized canonical correlation analysis (GCCA) for features extracted handwriting that can complement speech-based features. Three problems are addressed: classification PD patients vs. healthy controls, prediction...

10.1109/icassp.2017.7952700 article EN 2017-03-01

Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling

OPENALEX - Publications

Tobias Bocklet Korbinian Riedhammer Elmar Nöth Ulrich Eysholdt Tino Haderlein

10.1016/j.jvoice.2011.04.010 article EN Journal of Voice 2011-08-06

Speech Intelligibility Enhancement After Maxillary Denture Treatment and Its Impact on Quality of Life

OPENALEX - Publications

Christian Knipfer Max Riemann Tobias Bocklet Elmar Noeth Maria Schuster and 4 more

Tooth loss and its prosthetic rehabilitation significantly affect speech intelligibility. However, little is known about the influence of deficiencies on oral health-related quality life (OHRQoL). The aim this study was to investigate whether intelligibility enhancement through influences OHRQoL in patients wearing complete maxillary dentures. Speech by means an automatic recognition system (ASR) prospectively evaluated compared with subjectively assessed Oral Health Impact Profile (OHIP)...

10.11607/ijp.3597 article EN The International Journal of Prosthodontics 2014-01-01

Optimized Self-supervised Training with BEST-RQ for Speech Recognition

OPENALEX - Publications

Ilja Baumann Dominik Wagner Korbinian Riedhammer Tobias Bocklet

Self-supervised learning has been successfully used for various speech related tasks, including automatic recognition. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ) achieved state-of-the-art results in In this work, we further optimize the BEST-RQ approach using Kullback-Leibler divergence as an additional regularizing loss and multi-codebook extension per cluster derived from low-level feature clustering. Preliminary experiments on train-100 split of LibriSpeech...

10.48550/arxiv.2501.16131 preprint EN arXiv (Cornell University) 2025-01-27

Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models

OPENALEX - Publications

Christopher Simic Korbinian Riedhammer Tobias Bocklet

We present an approach to Audio-Visual Speech Recognition that builds on a pre-trained Whisper model. To infuse visual information into this audio-only model, we extend it with AV fusion module and LoRa adapters, one of the most up-to-date adapter approaches. One advantage adapter-based approaches, is only relatively small number parameters are trained, while basic model remains unchanged. Common AVSR approaches train single models handle several noise categories levels simultaneously....

10.48550/arxiv.2502.01709 preprint EN arXiv (Cornell University) 2025-02-03

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

OPENALEX - Publications

Seanie Lee Dong Bok Lee David Wagner Myoung-Ah Kang Haebin Seong and 3 more

Deploying large language models (LLMs) in real-world applications requires robust safety guard to detect and block harmful user prompts. While achieve strong performance, their computational cost is substantial. To mitigate this, smaller distilled are used, but they often underperform on "hard" examples where the larger model provides accurate predictions. We observe that many inputs can be reliably handled by model, while only a small fraction require model's capacity. Motivated we propose...

10.48550/arxiv.2502.12464 preprint EN arXiv (Cornell University) 2025-02-17

Digital Operating Mode Classification of Real-World Amateur Radio Transmissions

OPENALEX - Publications

Maximilian Bundscherer Thomas Schmitt Ilja Baumann Tobias Bocklet

10.1109/icassp49660.2025.10889837 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Optimized Self-supervised Training with BEST-RQ for Speech Recognition

OPENALEX - Publications

Ilja Baumann Dominik Wagner Korbinian Riedhammer Tobias Bocklet

10.1109/icassp49660.2025.10889362 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models

OPENALEX - Publications

Christopher Simic Korbinian Riedhammer Tobias Bocklet

10.1109/icassp49660.2025.10889566 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Automatic, computer‐based speech assessment on edentulous patients with and without complete dentures – preliminary results

OPENALEX - Publications

Florian Stelzle B. UGRINOVIC Christian Knipfer Tobias Bocklet Elmar Nöth and 4 more

Summary Dental rehabilitation of edentulous patients with complete dentures includes not only aesthetics and mastication food, but also speech quality. It was the aim this study to introduce validate a computer‐based recognition system (ASR) for automatic assessment in after dental dentures. To examine impact on production, outcome without compared. Twenty‐eight reading standardized text were recorded twice – their situ . A control group 40 healthy subjects natural dentition under same...

10.1111/j.1365-2842.2009.02047.x article EN Journal of Oral Rehabilitation 2010-01-18

Age and gender recognition based on multiple systems - early vs. late fusion

OPENALEX - Publications

Tobias Bocklet Georg Stemmer Viktor Zeißler Elmar Nöth

This paper focuses on the automatic recognition of a person’s age and gender based only his or her voice. Up to five different systems are compared combined in configurations: three model speaker’s characteristics feature spaces, i.e., MFCC, PLP, TRAPS, by Gaussian mixture models. The features these concatenated mean vectors. System number 4 uses physical two-mass vocal estimates data-driven optimization procedure 9 glottal from voiced speech sections. For each utterance minimum, maximum...

10.21437/interspeech.2010-748 article EN Interspeech 2022 2010-09-26

Speaker recognition using syllable-based constraints for cepstral frame selection

OPENALEX - Publications

Tobias Bocklet Elizabeth Shriberg

We describe a new GMM-UBM speaker recognition system that uses standard cepstral features, but selects different frames of speech for subsystems. Subsystems, or ldquoconstraintsrdquo, are based on syllable-level information and combined at the score level. Results both NIST 2006 2008 test data sets English telephone train condition reveal set eight constraints performs extremely well, resulting in better performance than other commonly-used models. Given still largely-unexplored world...

10.1109/icassp.2009.4960636 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment

OPENALEX - Publications

Catherine Middag Tobias Bocklet Jean‐Pierre Martens Elmar Nöth

Intelligibility is widely used to measure the severity of articulatory problems in pathological speech.Recently, a number automatic intelligibility assessment tools have been developed.Most them use speech recognizers (ASR) compare patient's utterance with target text.These methods are bound one language and tend be less accurate when speakers hesitate or make reading errors.To circumvent these problems, two different ASR-free were developed over last few years, only making acoustic...

10.21437/interspeech.2011-752 article EN Interspeech 2022 2011-08-27

Factors influencing relative speech intelligibility in patients with oral squamous cell carcinoma: a prospective study using automatic, computer-based speech analysis

OPENALEX - Publications

Florian Stelzle Christian Knipfer Maria Schuster Tobias Bocklet Elmar Nöth and 6 more

10.1016/j.ijom.2013.05.021 article EN International Journal of Oral and Maxillofacial Surgery 2013-07-08

Coming Soon ...