Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

Relevance
DOI: 10.48550/arxiv.2210.09984 Publication Date: 2022-01-01
ABSTRACT
MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around world. These languages diverse typologies, originate from many language families, and are associated with varying amounts available resources -- including what researchers typically characterize as high-resource well low-resource languages. Our designed to support creation evaluation models monolingual retrieval, where queries corpora in same language. In total, gathered 700k high-quality relevance judgments 77k Wikipedia these all assessments been performed by hired our team. goal spur research will improve continuum thus enhancing information access capabilities populations world, particularly those traditionally underserved. This overview paper describes baselines share community. The website live at http://miracl.ai/.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....