Performant ASR Models for Medical Entities in Accented Speech

FOS: Computer and information sciences Sound (cs.SD) Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering Computation and Language (cs.CL) Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
DOI: 10.21437/interspeech.2024-2261 Publication Date: 2024-09-01T07:10:12Z
ABSTRACT
Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical applicability in healthcare environments.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....