Mofetoluwa Adeyemi

ORCID: 0009-0003-2859-7136
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Text and Document Classification Technologies
  • Digital Humanities and Scholarship
  • Speech and dialogue systems
  • Language, Linguistics, Cultural Analysis
  • Genetics, Bioinformatics, and Biomedical Research
  • Semantic Web and Ontologies
  • Multimodal Machine Learning Applications
  • Data Quality and Management
  • African history and culture analysis
  • Translation Studies and Practices
  • Image Processing and 3D Reconstruction
  • Information Retrieval and Search Behavior
  • Bayesian Methods and Mixture Models
  • Interpreting and Communication in Healthcare

University of Waterloo
2022-2024

Leiden University
2022

Johns Hopkins University
2022

University of Washington
2022

Emmanuel College - Massachusetts
2022

Université d'Orléans
2022

Boston College
2022

Indiana University Bloomington
2022

The University of Melbourne
2022

Martin University
2022

Abstract With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation large, Web-mined text datasets covering hundreds languages. We manually audit quality 205 language-specific corpora released with five major public (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource systematic issues: At least 15 no usable text, significant fraction contains less than 50% sentences acceptable quality. In...

10.1162/tacl_a_00447 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

Abstract We take a step towards addressing the under- representation of African continent in NLP research by bringing together different stakeholders to create first large, publicly available, high-quality dataset for named entity recognition (NER) ten languages. detail characteristics these languages help researchers and practitioners better understand challenges they pose NER tasks. analyze our datasets conduct an extensive empirical evaluation state- of-the-art methods across both...

10.1162/tacl_a_00416 article EN cc-by Transactions of the Association for Computational Linguistics 2021-01-01

Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade...

10.18653/v1/2020.findings-emnlp.195 article EN cc-by 2020-01-01

David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire Memdjokam Koagne,...

10.18653/v1/2022.emnlp-main.298 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval (XOR QA) -- those that retrieve answer from other while serving people in their native language offer a means filling this gap. To end, we create AfriQA, first cross-lingual QA dataset with focus on languages. AfriQA includes 12,000+ XOR examples across 10 While previous datasets focused primarily...

10.48550/arxiv.2305.06897 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Pretrained language models represent the state of art in NLP, but successful construction such often requires large amounts data and computational resources.Thus, paucity for low-resource languages impedes development robust NLP capabilities these languages.There has been some recent success pretraining encoderonly solely on a combination lowresource African languages, exemplified by AfriBERTa.In this work, we extend approach "small data" to encoderdecoder models.We introduce AfriTeVa,...

10.18653/v1/2022.deeplo-1.14 article EN cc-by 2022-01-01

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents languages, we contribute first high-quality datasets 6 languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total 9,000 turns, each language having 1,500 which translate from portion English multi-domain MultiWOZ dataset. Subsequently, benchmark by investigating...

10.1109/ijcnn54540.2023.10191208 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2023-06-18

Akintunde Oladipo, Mofetoluwa Adeyemi, Orevaoghene Ahia, Abraham Owodunni, Odunayo Ogundepo, David Adelani, Jimmy Lin. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.11 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, Abdou Diop, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Kahira, Shamsuddeen Muhammad, Akintunde Oladipo, Abraham Owodunni, Atnafu Tonja, Iyanuoluwa Shode, Akari Asai, Anuoluwapo Aremu, Ayodele Awokoya, Bernard Opoku, Chiamaka Chukwuneke, Christine Mwase, Clemencia Siro, Stephen Arthur, Tunde Ajayi,...

10.18653/v1/2023.findings-emnlp.997 article EN cc-by 2023-01-01

This paper provides a short overview of the CIRAL track at Forum for Information Retrieval Evaluation (FIRE) 2023. focused on cross-lingual information retrieval (CLIR) between English and four African languages which include Hausa, Somali, Swahili, Yoruba. In bid to promote CLIR research curate test collection languages, community evaluations were carried out via pooling. We briefly discuss details task, dataset, relevance assessment results from in this paper.

10.1145/3632754.3633076 article EN 2023-12-15

10.1145/3626772.3657952 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for African Languages Shared Task. This work describes our approach, which is based on filtering given noisy data using a sentence-pair classifier that was built by fine-tuning pre-trained language model. To train classifier, we obtain positive samples (i.e. high-quality parallel sentences) from gold-standard curated dataset and extract negative low-quality automatically aligned choosing sentences with low alignment...

10.48550/arxiv.2210.10692 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents languages, we contribute first high-quality datasets 6 languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These consist 1,500 turns each, which translate from a portion English multi-domain MultiWOZ dataset. Subsequently, investigate analyze effectiveness modelling through...

10.48550/arxiv.2204.08083 preprint EN cc-by arXiv (Cornell University) 2022-01-01

African languages are spoken by over a billion people, but underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well lack understanding settings where current methods effective. In this paper, we make towards solutions for these challenges, focusing on task named entity recognition (NER). We create largest human-annotated NER dataset 20 languages, study behavior state-of-the-art cross-lingual transfer an...

10.48550/arxiv.2210.12391 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Large language models (LLMs) have shown impressive zero-shot capabilities in various document reranking tasks. Despite their successful implementations, there is still a gap existing literature on effectiveness low-resource languages. To address this gap, we investigate how LLMs function as rerankers cross-lingual information retrieval (CLIR) systems for African Our implementation covers English and four languages (Hausa, Somali, Swahili, Yoruba) examine with queries passages the...

10.48550/arxiv.2312.16159 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...