Mixat: A Data Set of Bilingual Emirati-English Speech

Data set
DOI: 10.48550/arxiv.2405.02578 Publication Date: 2024-05-04
ABSTRACT
This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings current recognition resources when applied speech, and in particular, bilignual speakers who often mix switch between their local dialect The data set consists 15 hours derived from two public podcasts featuring native speakers, one which is form conversations host guest. Therefore, collection contains examples Emirati-English code-switching both formal natural conversational contexts. In this paper, we describe process annotation, some features statistics resulting set. addition, evaluate performance pre-trained Arabic multi-lingual ASR systems on our dataset, demonstrating existing models low-resource dialectal Arabic, additional challenge recognizing ASR. will be made publicly available for research use.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....