NFDI4DS | UHH-SEMS - Publication Details

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Foundation (evidence)

DOI: 10.48550/arxiv.2405.14161 Publication Date: 2024-05-23

Abstract Supplemental Material References Cited by

AUTHORS (7)

Yuchen Hu

Chen Chen

Chao-Han Huck Yang

Chengwei Qin

Pin‐Yu Chen

Eng Siong Chng

Chao Zhang

ABSTRACT

We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we a novel indicator that empirically integrates step-wise information during assess token-level quality pseudo labels without ground truth, thereby guiding model updates effective adaptation. Experimental results show achieves average 13.5% relative reduction word error rate across 14 it sometimes even approaches upper-bound performance supervised Surprisingly, also observe prevents adapted from common catastrophic forgetting problem recalling source-domain data. Furthermore, exhibits high efficiency only requires less than one-hour data, seamless generality alternative large translation tasks. Our code aims open source research communities.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....