NFDI4DS | UHH-SEMS - Publication Details

Mofetoluwa Adeyemi

ORCID: 0009-0003-2859-7136

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5076302977

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Text and Document Classification Technologies
Digital Humanities and Scholarship
Speech and dialogue systems
Language, Linguistics, Cultural Analysis
Genetics, Bioinformatics, and Biomedical Research
Semantic Web and Ontologies
Multimodal Machine Learning Applications
Data Quality and Management
African history and culture analysis
Translation Studies and Practices
Image Processing and 3D Reconstruction
Information Retrieval and Search Behavior
Bayesian Methods and Mixture Models
Interpreting and Communication in Healthcare

University of Waterloo
2022-2024

Leiden University
2022

Johns Hopkins University
2022

University of Washington
2022

Emmanuel College - Massachusetts
2022

Université d'Orléans
2022

Boston College
2022

Indiana University Bloomington
2022

The University of Melbourne
2022

Martin University
2022

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

OPENALEX - Publications

Julia Kreutzer Isaac Caswell Lisa Wang Ahsan Wahab Daan van Esch and 47 more

Abstract With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation large, Web-mined text datasets covering hundreds languages. We manually audit quality 205 language-specific corpora released with five major public (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource systematic issues: At least 15 no usable text, significant fraction contains less than 50% sentences acceptable quality. In...

10.1162/tacl_a_00447 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

MasakhaNER: Named Entity Recognition for African Languages

OPENALEX - Publications

David Ifeoluwa Adelani Jade Abbott Graham Neubig Daniel D’souza Julia Kreutzer and 56 more

Abstract We take a step towards addressing the under- representation of African continent in NLP research by bringing together different stakeholders to create first large, publicly available, high-quality dataset for named entity recognition (NER) ten languages. detail characteristics these languages help researchers and practitioners better understand challenges they pose NER tasks. analyze our datasets conduct an extensive empirical evaluation state- of-the-art methods across both...

10.1162/tacl_a_00416 article EN cc-by Transactions of the Association for Computational Linguistics 2021-01-01

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

OPENALEX - Publications

Wilhelmina Nekoto Vukosi Marivate Tshinondiwa Matsila Timi Fasubaa Taiwo Fagbohungbe and 42 more

Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade...

10.18653/v1/2020.findings-emnlp.195 article EN cc-by 2020-01-01

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

OPENALEX - Publications

David Ifeoluwa Adelani Graham Neubig Sebastian Ruder Shruti Rijhwani Michael Beukman and 40 more

David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire Memdjokam Koagne,...

10.18653/v1/2022.emnlp-main.298 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

OPENALEX - Publications

Odunayo Ogundepo Tajuddeen Gwadabe Clara E. Rivera Jonathan H. Clark Sebastian Ruder and 47 more

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval (XOR QA) -- those that retrieve answer from other while serving people in their native language offer a means filling this gap. To end, we create AfriQA, first cross-lingual QA dataset with focus on languages. AfriQA includes 12,000+ XOR examples across 10 While previous datasets focused primarily...

10.48550/arxiv.2305.06897 preprint EN cc-by arXiv (Cornell University) 2023-01-01

AfriTeVA: Extending ?Small Data? Pretraining Approaches to Sequence-to-Sequence Models

OPENALEX - Publications

Odunayo Jude Ogundepo Akintunde Oladipo Mofetoluwa Adeyemi Kelechi Ogueji and Jimmy Lin

Pretrained language models represent the state of art in NLP, but successful construction such often requires large amounts data and computational resources.Thus, paucity for low-resource languages impedes development robust NLP capabilities these languages.There has been some recent success pretraining encoderonly solely on a combination lowresource African languages, exemplified by AfriBERTa.In this work, we extend approach "small data" to encoderdecoder models.We introduce AfriTeVa,...

10.18653/v1/2022.deeplo-1.14 article EN cc-by 2022-01-01

AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages

OPENALEX - Publications

Tosin Adewumi Mofetoluwa Adeyemi Anuoluwapo Aremu Bukola Peters Happy Buzaaba and 15 more

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents languages, we contribute first high-quality datasets 6 languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total 9,000 turns, each language having 1,500 which translate from portion English multi-domain MultiWOZ dataset. Subsequently, benchmark by investigating...

10.1109/ijcnn54540.2023.10191208 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2023-06-18

Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages

OPENALEX - Publications

Mofetoluwa Adeyemi Akintunde Oladipo Ronak Pradeep Jimmy Lin

10.18653/v1/2024.acl-short.59 article EN 2024-01-01

Better Quality Pre-training Data and T5 Models for African Languages

OPENALEX - Publications

Akintunde Oladipo Mofetoluwa Adeyemi Orevaoghene Ahia Abraham Owodunni Odunayo Ogundepo and 2 more

Akintunde Oladipo, Mofetoluwa Adeyemi, Orevaoghene Ahia, Abraham Owodunni, Odunayo Ogundepo, David Adelani, Jimmy Lin. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.11 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Cross-lingual Open-Retrieval Question Answering for African Languages

OPENALEX - Publications

Odunayo Ogundepo Tajuddeen Gwadabe Clara Rivera Jonathan Clark Sebastian Ruder and 39 more

Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, Abdou Diop, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Kahira, Shamsuddeen Muhammad, Akintunde Oladipo, Abraham Owodunni, Atnafu Tonja, Iyanuoluwa Shode, Akari Asai, Anuoluwapo Aremu, Ayodele Awokoya, Bernard Opoku, Chiamaka Chukwuneke, Christine Mwase, Clemencia Siro, Stephen Arthur, Tunde Ajayi,...

10.18653/v1/2023.findings-emnlp.997 article EN cc-by 2023-01-01

CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages

OPENALEX - Publications

Mofetoluwa Adeyemi Akintunde Oladipo Xinyu Zhang David Alfonso-Hermelo Mehdi Rezagholizadeh and 2 more

This paper provides a short overview of the CIRAL track at Forum for Information Retrieval Evaluation (FIRE) 2023. focused on cross-lingual information retrieval (CLIR) between English and four African languages which include Hausa, Somali, Swahili, Yoruba. In bid to promote CLIR research curate test collection languages, community evaluations were carried out via pooling. We briefly discuss details task, dataset, relevance assessment results from in this paper.

10.1145/3632754.3633076 article EN 2023-12-15

MasakhaNER: Named entity recognition for African languages

OPENALEX - Publications

David Ifeoluwa Adelani Jade Abbott Graham Neubig Daniel D’souza Julia Kreutzer and 56 more

10.1162/tacl article EN other-oa 2021-06-14

On Backbones and Training Regimes for Dense Retrieval in African Languages

OPENALEX - Publications

Akintunde Oladipo Mofetoluwa Adeyemi Jimmy Lin

10.1145/3626772.3657952 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

CIRAL: A Test Collection for CLIR Evaluations in African Languages

OPENALEX - Publications

Mofetoluwa Adeyemi Akintunde Oladipo Xinyu Zhang David Alfonso-Hermelo Mehdi Rezagholizadeh and 18 more

10.1145/3626772.3657884 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages

OPENALEX - Publications

Idris Abdulmumin Michael Beukman Jesujoba O. Alabi Chris Chinenye Emezue Everlyn Asiko and 6 more

We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for African Languages Shared Task. This work describes our approach, which is based on filtering given noisy data using a sentence-pair classifier that was built by fine-tuning pre-trained language model. To train classifier, we obtain positive samples (i.e. high-quality parallel sentences) from gold-standard curated dataset and extract negative low-quality automatically aligned choosing sentences with low alignment...

10.48550/arxiv.2210.10692 preprint EN cc-by arXiv (Cornell University) 2022-01-01

AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages

OPENALEX - Publications

Tosin Adewumi Mofetoluwa Adeyemi Anuoluwapo Aremu Bukola Peters Happy Buzaaba and 15 more

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents languages, we contribute first high-quality datasets 6 languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These consist 1,500 turns each, which translate from a portion English multi-domain MultiWOZ dataset. Subsequently, investigate analyze effectiveness modelling through...

10.48550/arxiv.2204.08083 preprint EN cc-by arXiv (Cornell University) 2022-01-01

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

OPENALEX - Publications

David Ifeoluwa Adelani Graham Neubig Sebastian Ruder Shruti Rijhwani Michael Beukman and 40 more

African languages are spoken by over a billion people, but underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well lack understanding settings where current methods effective. In this paper, we make towards solutions for these challenges, focusing on task named entity recognition (NER). We create largest human-annotated NER dataset 20 languages, study behavior state-of-the-art cross-lingual transfer an...

10.48550/arxiv.2210.12391 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages

OPENALEX - Publications

Mofetoluwa Adeyemi Akintunde Oladipo Ronak Pradeep Jimmy Lin

Large language models (LLMs) have shown impressive zero-shot capabilities in various document reranking tasks. Despite their successful implementations, there is still a gap existing literature on effectiveness low-resource languages. To address this gap, we investigate how LLMs function as rerankers cross-lingual information retrieval (CLIR) systems for African Our implementation covers English and four languages (Hausa, Somali, Swahili, Yoruba) examine with queries passages the...

10.48550/arxiv.2312.16159 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...