Speech Recognition for Medical Conversations
FOS: Computer and information sciences
Sound (cs.SD)
Computer Science - Computation and Language
Machine Learning (stat.ML)
02 engineering and technology
Computer Science - Sound
Statistics - Machine Learning
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing
DOI:
10.21437/interspeech.2018-40
Publication Date:
2018-08-28T09:55:42Z
AUTHORS (14)
ABSTRACT
In this paper we document our experiences with developing speech recognition for medical transcription -a system that automatically transcribes doctor-patient conversations.Towards goal, built a along two different methodological lines Connectionist Temporal Classification (CTC) phoneme based model and Listen Attend Spell (LAS) grapheme model.To train these models used corpus of anonymized conversations representing approximately 14,000 hours speech.Because noisy transcripts alignments in the corpus, significant amount effort was invested data cleaning issues.We describe two-stage strategy followed segmenting data.The cleanup development matched language essential to success CTC models.The LAS models, however were found be resilient alignment transcript noise did not require use models.CTC able achieve word error rate 20.1%, 18.3%.Our analysis shows both perform well on important utterances therefore can practical transcribing conversations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (35)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....