NFDI4DS | UHH-SEMS - Publication Details

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia

OPENALEX - Publications

Robert L. Macdonald Pan-Pan Jiang Julie Cattiau Rus Heywood Richard Cave and 7 more

10.21437/interspeech.2021-697 article EN Interspeech 2022 2021-08-27

Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases

OPENALEX - Publications

Jordan R. Green Robert L. Macdonald Pan-Pan Jiang Julie Cattiau Rus Heywood and 7 more

10.21437/interspeech.2021-1384 article EN Interspeech 2022 2021-08-27

Speech Recognition with LLMs Adapted to Disordered Speech Using Reinforcement Learning

OPENALEX - Publications

Chirag Nagpal Subhashini Venugopalan Jimmy Tobin Marilyn Ladewig Katherine Heller and 1 more

10.1109/icassp49660.2025.10888006 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Towards a Single ASR Model That Generalizes to Disordered Speech

OPENALEX - Publications

Jimmy Tobin Katrin Tomanek Subhashini Venugopalan

10.1109/icassp49660.2025.10888895 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

OPENALEX - Publications

Jimmy Tobin Katrin Tomanek

This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered using small amounts per-speaker adaptation data. We trained models 195 individuals with different types and severities impairment training sets ranging in size from <1 minute to 18-20 minutes Word error rate (WER) thresholds were selected determine Success Percentage (the percentage reaching target WER) application scenarios. For home automation scenario, 79% speakers reached...

10.1109/icassp43922.2022.9747516 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Large Language Models As A Proxy For Human Evaluation In Assessing The Comprehensibility Of Disordered Speech Transcription

OPENALEX - Publications

Katrin Tomanek Jimmy Tobin Subhashini Venugopalan Richard Cave Katie Seaver and 2 more

Automatic Speech Recognition (ASR) systems, despite significant advances in recent years, still have much room for improvement particularly the recognition of disordered speech. Even so, erroneous transcripts from ASR models can help people with speech be better understood, especially if transcription doesn't significantly change intended meaning. Evaluating efficacy this use case requires a methodology measuring impact errors on meaning and comprehensibility. Human evaluation is gold...

10.1109/icassp48485.2024.10447177 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech

OPENALEX - Publications

Jimmy Tobin P.A. Nelson Bob MacDonald Rus Heywood Richard Cave and 4 more

This study examines the effectiveness of automatic speech recognition (ASR) for individuals with disorders, addressing gap in performance between read and conversational ASR. We analyze factors influencing this disparity effect mode-specific training on ASR accuracy.

10.1044/2024_jslhr-24-00045 article EN Journal of Speech Language and Hearing Research 2024-07-04

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

OPENALEX - Publications

Subhashini Venugopalan Jimmy Tobin Samuel Yang Katie Seaver Richard Cave and 5 more

We developed dysarthric speech intelligibility classifiers on 551,176 disordered samples contributed by a diverse set of 468 speakers, with range self-reported speaking disorders and rated for their overall five-point scale. trained three models following different deep learning approaches evaluated them ~ 94K utterances from 100 speakers. further found the to generalize well (without training) TORGO database[1] (100% accuracy), UASpeech[2] (0.93 correlation), ALS-TDI PMP[3] (0.81 AUC)...

10.1109/icassp49357.2023.10095933 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Assessing ASR Model Quality on Disordered Speech using BERTScore

OPENALEX - Publications

Jimmy Tobin Qisheng Li Subhashini Venugopalan Katie Seaver Richard Cave and 1 more

Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality.It has been shown that ASR models tend have much higher WER on speakers with impairments than typical English speakers.It hard determine if can be useful at such high error rates.This study investigates use of BERTScore, an evaluation for text generation, provide a more informative measure quality and usefulness.Both BERTScore were compared prediction errors manually annotated by...

10.21437/s4sg.2022-6 article EN 2022-09-24

Comparing Supervised Models and Learned Speech Representations for Classifying Intelligibility of Disordered Speech on Selected Phrases

OPENALEX - Publications

Subhashini Venugopalan Joel Shor Manoj Plakal Jimmy Tobin Katrin Tomanek and 2 more

Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity impairment. Classification approaches also help identify hard-to-recognize samples to teach ASR systems about variable manifestations impaired speech. Here, we develop compare different deep learning techniques classify intelligibility on selected phrases. We collected from a diverse set 661 speakers with variety self-reported disorders speaking 29 words or phrases, which...

10.21437/interspeech.2021-1913 article EN Interspeech 2022 2021-08-27

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

OPENALEX - Publications

Panpan Jiang Jimmy Tobin Katrin Tomanek Robert MacDonald Katie Seaver and 4 more

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective the project create large, high-quality, and diverse corpus. This report describes project's latest advancements in data collection annotation methodologies, such as expanding speaker diversity database, adding human-reviewed transcript corrections audio quality tags 350K (of 1.2M total) recordings, amassing comprehensive set metadata (including more...

10.21437/interspeech.2024-578 article EN Interspeech 2022 2024-09-01

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

OPENALEX - Publications

Panpan Jiang Jimmy Tobin Katrin Tomanek Robert L. Macdonald Katie Seaver and 4 more

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective the project create large, high-quality, and diverse corpus. This report describes project's latest advancements in data collection annotation methodologies, such as expanding speaker diversity database, adding human-reviewed transcript corrections audio quality tags 350K (of 1.2M total) recordings, amassing comprehensive set metadata (including more...

10.48550/arxiv.2409.09190 preprint EN arXiv (Cornell University) 2024-09-13

Towards a Single ASR Model That Generalizes to Disordered Speech

OPENALEX - Publications

Jimmy Tobin Katrin Tomanek Subhashini Venugopalan

This study investigates the impact of integrating a dataset disordered speech recordings ($\sim$1,000 hours) into fine-tuning near state-of-the-art ASR baseline system. Contrary to what one might expect, despite data being less than 1% training system, we find considerable improvement in recognition accuracy. Specifically, observe 33% on prompted speech, and 26% newly gathered spontaneous, conversational speech. Importantly, there is no significant performance decline standard benchmarks....

10.48550/arxiv.2412.19315 preprint EN arXiv (Cornell University) 2024-12-26

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

OPENALEX - Publications

Chirag Nagpal Subhashini Venugopalan Jimmy Tobin Marilyn Ladewig Katherine Heller and 1 more

We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables to adapt better disordered than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary audio the recognize by fine-tuning transcripts. then use RL rewards based syntactic semantic accuracy measures generalizing LLM speech. While resulting does not outperform existing systems for...

10.48550/arxiv.2501.00039 preprint EN arXiv (Cornell University) 2024-12-24

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

OPENALEX - Publications

Subhashini Venugopalan Jimmy Tobin Samuel Yang Katie Seaver Richard J. N. Cave and 5 more

We developed dysarthric speech intelligibility classifiers on 551,176 disordered samples contributed by a diverse set of 468 speakers, with range self-reported speaking disorders and rated for their overall five-point scale. trained three models following different deep learning approaches evaluated them ~94K utterances from 100 speakers. further found the to generalize well (without training) TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as...

10.48550/arxiv.2303.07533 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ED_EventTest_Sep27

OPENALEX - Publications

Jimmy Tobin

10.3389/conf.fnhum.2018.227.00001 article cc-by Frontiers in Human Neuroscience 2018-01-01

EventTest2 DEC03

OPENALEX - Publications

Jimmy Tobin

10.3389/conf.fnagi.2018.07.00001 article cc-by Frontiers in Aging Neuroscience 2018-01-01

Assessing ASR Model Quality on Disordered Speech using BERTScore

OPENALEX - Publications

Jimmy Tobin Qisheng Li Subhashini Venugopalan Katie Seaver Richard Cave and 1 more

Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend have much higher WER on speakers with impairments than typical English speakers. hard determine if can be useful at such high error rates. This study investigates use of BERTScore, an evaluation for text generation, provide a more informative measure quality and usefulness. Both BERTScore were compared prediction errors manually annotated by...

10.48550/arxiv.2209.10591 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

OPENALEX - Publications

Jimmy Tobin Katrin Tomanek

This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered using small amounts per-speaker adaptation data. We trained models 195 individuals with different types and severities impairment training sets ranging in size from <1 minute to 18-20 minutes Word error rate (WER) thresholds were selected determine Success Percentage (the percentage reaching target WER) application scenarios. For home automation scenario, 79% speakers reached...

10.48550/arxiv.2110.04612 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases

OPENALEX - Publications

Subhashini Venugopalan Joel Shor Manoj Plakal Jimmy Tobin Katrin Tomanek and 2 more

Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity impairment. Classification approaches also help identify hard-to-recognize samples to teach ASR systems about variable manifestations impaired speech. Here, we develop compare different deep learning techniques classify intelligibility on selected phrases. We collected from a diverse set 661 speakers with variety self-reported disorders speaking 29 words or phrases, which...

10.48550/arxiv.2107.03985 preprint EN cc-by-sa arXiv (Cornell University) 2021-01-01