NFDI4DS | UHH-SEMS - Publication Details

Mixat: A Data Set of Bilingual Emirati-English Speech

Data set

DOI: 10.48550/arxiv.2405.02578 Publication Date: 2024-05-04

Abstract Supplemental Material References Cited by

AUTHORS (2)

Maryam Al Ali

Hanan Aldarmaki

ABSTRACT

This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings current recognition resources when applied speech, and in particular, bilignual speakers who often mix switch between their local dialect The data set consists 15 hours derived from two public podcasts featuring native speakers, one which is form conversations host guest. Therefore, collection contains examples Emirati-English code-switching both formal natural conversational contexts. In this paper, we describe process annotation, some features statistics resulting set. addition, evaluate performance pre-trained Arabic multi-lingual ASR systems on our dataset, demonstrating existing models low-resource dialectal Arabic, additional challenge recognizing ASR. will be made publicly available for research use.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Mixat: A Data Set of Bilingual Emirati-English Speech

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....