Speech Recognition for Medical Conversations

FOS: Computer and information sciences Sound (cs.SD) Computer Science - Computation and Language Machine Learning (stat.ML) 02 engineering and technology Computer Science - Sound Statistics - Machine Learning Audio and Speech Processing (eess.AS) FOS: Electrical engineering, electronic engineering, information engineering 0202 electrical engineering, electronic engineering, information engineering Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing
DOI: 10.21437/interspeech.2018-40 Publication Date: 2018-08-28T09:55:42Z
ABSTRACT
In this paper we document our experiences with developing speech recognition for medical transcription -a system that automatically transcribes doctor-patient conversations.Towards goal, built a along two different methodological lines Connectionist Temporal Classification (CTC) phoneme based model and Listen Attend Spell (LAS) grapheme model.To train these models used corpus of anonymized conversations representing approximately 14,000 hours speech.Because noisy transcripts alignments in the corpus, significant amount effort was invested data cleaning issues.We describe two-stage strategy followed segmenting data.The cleanup development matched language essential to success CTC models.The LAS models, however were found be resilient alignment transcript noise did not require use models.CTC able achieve word error rate 20.1%, 18.3%.Our analysis shows both perform well on important utterances therefore can practical transcribing conversations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (35)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....