Jimmy Tobin

ORCID: 0009-0009-5733-9295
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Voice and Speech Disorders
  • Phonetics and Phonology Research
  • Speech and Audio Processing
  • Speech and dialogue systems
  • Natural Language Processing Techniques
  • Text Readability and Simplification
  • Language Development and Disorders
  • Topic Modeling
  • Music Technology and Sound Studies

Google (United States)
2022-2025

10.1109/icassp49660.2025.10888006 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10888895 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered using small amounts per-speaker adaptation data. We trained models 195 individuals with different types and severities impairment training sets ranging in size from <1 minute to 18-20 minutes Word error rate (WER) thresholds were selected determine Success Percentage (the percentage reaching target WER) application scenarios. For home automation scenario, 79% speakers reached...

10.1109/icassp43922.2022.9747516 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Automatic Speech Recognition (ASR) systems, despite significant advances in recent years, still have much room for improvement particularly the recognition of disordered speech. Even so, erroneous transcripts from ASR models can help people with speech be better understood, especially if transcription doesn't significantly change intended meaning. Evaluating efficacy this use case requires a methodology measuring impact errors on meaning and comprehensibility. Human evaluation is gold...

10.1109/icassp48485.2024.10447177 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

This study examines the effectiveness of automatic speech recognition (ASR) for individuals with disorders, addressing gap in performance between read and conversational ASR. We analyze factors influencing this disparity effect mode-specific training on ASR accuracy.

10.1044/2024_jslhr-24-00045 article EN Journal of Speech Language and Hearing Research 2024-07-04

We developed dysarthric speech intelligibility classifiers on 551,176 disordered samples contributed by a diverse set of 468 speakers, with range self-reported speaking disorders and rated for their overall five-point scale. trained three models following different deep learning approaches evaluated them ~ 94K utterances from 100 speakers. further found the to generalize well (without training) TORGO database[1] (100% accuracy), UASpeech[2] (0.93 correlation), ALS-TDI PMP[3] (0.81 AUC)...

10.1109/icassp49357.2023.10095933 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality.It has been shown that ASR models tend have much higher WER on speakers with impairments than typical English speakers.It hard determine if can be useful at such high error rates.This study investigates use of BERTScore, an evaluation for text generation, provide a more informative measure quality and usefulness.Both BERTScore were compared prediction errors manually annotated by...

10.21437/s4sg.2022-6 article EN 2022-09-24

Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity impairment. Classification approaches also help identify hard-to-recognize samples to teach ASR systems about variable manifestations impaired speech. Here, we develop compare different deep learning techniques classify intelligibility on selected phrases. We collected from a diverse set 661 speakers with variety self-reported disorders speaking 29 words or phrases, which...

10.21437/interspeech.2021-1913 article EN Interspeech 2022 2021-08-27

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective the project create large, high-quality, and diverse corpus. This report describes project's latest advancements in data collection annotation methodologies, such as expanding speaker diversity database, adding human-reviewed transcript corrections audio quality tags 350K (of 1.2M total) recordings, amassing comprehensive set metadata (including more...

10.21437/interspeech.2024-578 article EN Interspeech 2022 2024-09-01

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective the project create large, high-quality, and diverse corpus. This report describes project's latest advancements in data collection annotation methodologies, such as expanding speaker diversity database, adding human-reviewed transcript corrections audio quality tags 350K (of 1.2M total) recordings, amassing comprehensive set metadata (including more...

10.48550/arxiv.2409.09190 preprint EN arXiv (Cornell University) 2024-09-13

This study investigates the impact of integrating a dataset disordered speech recordings ($\sim$1,000 hours) into fine-tuning near state-of-the-art ASR baseline system. Contrary to what one might expect, despite data being less than 1% training system, we find considerable improvement in recognition accuracy. Specifically, observe 33% on prompted speech, and 26% newly gathered spontaneous, conversational speech. Importantly, there is no significant performance decline standard benchmarks....

10.48550/arxiv.2412.19315 preprint EN arXiv (Cornell University) 2024-12-26

We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables to adapt better disordered than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary audio the recognize by fine-tuning transcripts. then use RL rewards based syntactic semantic accuracy measures generalizing LLM speech. While resulting does not outperform existing systems for...

10.48550/arxiv.2501.00039 preprint EN arXiv (Cornell University) 2024-12-24

We developed dysarthric speech intelligibility classifiers on 551,176 disordered samples contributed by a diverse set of 468 speakers, with range self-reported speaking disorders and rated for their overall five-point scale. trained three models following different deep learning approaches evaluated them ~94K utterances from 100 speakers. further found the to generalize well (without training) TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as...

10.48550/arxiv.2303.07533 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.3389/conf.fnhum.2018.227.00001 article cc-by Frontiers in Human Neuroscience 2018-01-01

10.3389/conf.fnagi.2018.07.00001 article cc-by Frontiers in Aging Neuroscience 2018-01-01

Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend have much higher WER on speakers with impairments than typical English speakers. hard determine if can be useful at such high error rates. This study investigates use of BERTScore, an evaluation for text generation, provide a more informative measure quality and usefulness. Both BERTScore were compared prediction errors manually annotated by...

10.48550/arxiv.2209.10591 preprint EN cc-by arXiv (Cornell University) 2022-01-01

This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered using small amounts per-speaker adaptation data. We trained models 195 individuals with different types and severities impairment training sets ranging in size from <1 minute to 18-20 minutes Word error rate (WER) thresholds were selected determine Success Percentage (the percentage reaching target WER) application scenarios. For home automation scenario, 79% speakers reached...

10.48550/arxiv.2110.04612 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity impairment. Classification approaches also help identify hard-to-recognize samples to teach ASR systems about variable manifestations impaired speech. Here, we develop compare different deep learning techniques classify intelligibility on selected phrases. We collected from a diverse set 661 speakers with variety self-reported disorders speaking 29 words or phrases, which...

10.48550/arxiv.2107.03985 preprint EN cc-by-sa arXiv (Cornell University) 2021-01-01
Coming Soon ...