NFDI4DS | UHH-SEMS - Publication Details

Adopting Whisper for Confidence Estimation

FOS: Computer and information sciences Computer Science - Machine Learning Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering Electrical Engineering and Systems Science - Audio and Speech Processing Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2502.13446 Publication Date: 2025-02-19

Abstract Supplemental Material References Cited by

AUTHORS (4)

Vaibhav Aggarwal

Shabari S Nair

Yash Verma

Yash Jogi

ABSTRACT

Recent research on word-level confidence estimation for speech recognition systems has primarily focused lightweight models known as Confidence Estimation Modules (CEMs), which rely hand-engineered features derived from Automatic Speech Recognition (ASR) outputs. In contrast, we propose a novel end-to-end approach that leverages the ASR model itself (Whisper) to generate scores. Specifically, introduce method in Whisper is fine-tuned produce scalar scores given an audio input and its corresponding hypothesis transcript. Our experiments demonstrate Whisper-tiny model, comparable size strong CEM baseline, achieves similar performance in-domain dataset surpasses baseline eight out-of-domain datasets, whereas Whisper-large consistently outperforms by substantial margin across all datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Adopting Whisper for Confidence Estimation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....