Jade Abbott

ORCID: 0000-0001-6061-0888
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Health, Environment, Cognitive Aging
  • Biomedical and Engineering Education
  • Genetics, Bioinformatics, and Biomedical Research
  • Metaheuristic Optimization Algorithms Research
  • Translation Studies and Practices
  • Multimodal Machine Learning Applications
  • Text Readability and Simplification
  • Software Engineering Research
  • Wikis in Education and Collaboration
  • Speech Recognition and Synthesis
  • Social and Intergroup Psychology
  • Robotics and Automated Systems
  • Image Processing and 3D Reconstruction
  • Semantic Web and Ontologies
  • Climate Change Communication and Perception
  • ICT in Developing Communities
  • Language, Linguistics, Cultural Analysis
  • Computational and Text Analysis Methods
  • Digital Humanities and Scholarship
  • Text and Document Classification Technologies
  • Insect and Arachnid Ecology and Behavior
  • Language, Metaphor, and Cognition
  • Advanced Multi-Objective Optimization Algorithms

Carnegie Mellon University
2022-2023

The University of Melbourne
2022-2023

Applied Mathematics (United States)
2023

Karlsruhe Institute of Technology
2023

Minzu University of China
2023

Dublin City University
2023

Fondazione Bruno Kessler
2023

University of Trento
2023

University of the Witwatersrand
2023

SIL International
2022

Abstract We take a step towards addressing the under- representation of African continent in NLP research by bringing together different stakeholders to create first large, publicly available, high-quality dataset for named entity recognition (NER) ten languages. detail characteristics these languages help researchers and practitioners better understand challenges they pose NER tasks. analyze our datasets conduct an extensive empirical evaluation state- of-the-art methods across both...

10.1162/tacl_a_00416 article EN cc-by Transactions of the Association for Computational Linguistics 2021-01-01

Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade...

10.18653/v1/2020.findings-emnlp.195 article EN cc-by 2020-01-01

David Adelani, Jesujoba Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, Ernie Chang, Tajuddeen Gwadabe, Freshia Sackey, Bonaventure F. P. Dossou, Chris Emezue, Colin Leong, Michael Beukman, Shamsuddeen Muhammad, Guyo Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Wairagala, Muhammad Umair Nasir, Benjamin Ajibade, Tunde Ajayi, Yvonne Gitau, Jade Abbott, Mohamed Ahmed, Millicent Ochieng, Anuoluwapo Aremu, Perez...

10.18653/v1/2022.naacl-main.223 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

African languages are numerous, complex and low-resourced. The datasets required for machine translation difficult to discover, existing research is hard reproduce. Minimal attention has been given so there scant regarding the problems that arise when using techniques. To begin addressing these problems, we trained models translate English five of official South (Afrikaans, isiZulu, Northern Sotho, Setswana, Xitsonga), making use modern neural results obtained show promise techniques...

10.48550/arxiv.1906.05685 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: lack focus from government funding, discoverability, community, sheer language complexity, difficulty reproducing papers no benchmarks compare techniques. To begin address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort...

10.48550/arxiv.2003.11529 preprint EN other-oa arXiv (Cornell University) 2020-01-01

After decades of political, economic, and scientific efforts, humanity has not gotten any closer to global sustainability. With less than a decade reach the UN Sustainable Development Goals (SDGs) deadline 2030 Agenda, we show that development agendas may be getting lost in translation, from their initial formulation final implementation. Sustainability science does “speak” most 2000 languages Africa, where lack indigenous terminology hinders efforts such as COVID-19 pandemic fight....

10.3390/su14138133 article EN Sustainability 2022-07-04

<h2>Summary</h2> There has been a rise in natural language processing (NLP) communities across the African continent (Masakhane, AfricaNLP workshops). With this momentum noted, and given existing power asymmetries that plague continent, there is an urgent need to ensure these technologies move toward shared goals between organizations stakeholders, not only improve representation of languages cutting-edge NLP research but also enables technological advances human dignity, well-being, equity...

10.1016/j.patter.2023.100820 article EN cc-by Patterns 2023-08-01

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, recognition machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources African languages, which considered low-resource the context, these advanced remain out of reach. This work details AI4D - Language Program, a 3-part project that 1) incentivised crowd-sourcing, collection curation datasets through an...

10.48550/arxiv.2104.02516 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist often scattered and difficult to obtain discover. As a result, data code for existing research has rarely been shared. This lead struggle reproduce reported results, few publicly available benchmarks machine translation models exist. To start address these problems, we trained neural 5 Southern on publicly-available datasets. Code is provided training evaluate newly...

10.48550/arxiv.1906.10511 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Research in NLP lacks geographic diversity, and the question of how can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability reflects systemic problems society. In this paper, we focus on task Machine Translation (MT), that plays crucial role for information accessibility communication worldwide. Despite immense improvements MT over past decade, centered around few high-resourced languages. As...

10.48550/arxiv.2010.02353 preprint EN other-oa arXiv (Cornell University) 2020-01-01

High-resource language models often fall short in the African context, where there is a critical need for that are efficient, accessible, and locally relevant, even amidst significant computing data constraints. This paper introduces InkubaLM, small model with 0.4 billion parameters, which achieves performance comparable to significantly larger parameter counts more extensive training on tasks such as machine translation, question-answering, AfriMMLU, AfriXnli task. Notably, InkubaLM...

10.48550/arxiv.2408.17024 preprint EN arXiv (Cornell University) 2024-08-30

In this study, we investigate the effectiveness of using cross-lingual word embeddings for zero-shot transfer learning between a language with an abundant resource, English, and languagewith limited isiZulu. IsiZulu is part South African Nguni family, which characterised by complex agglutinating morphology. We use VecMap, open source tool, to obtain embeddings. To perform extrinsic evaluation embeddings, train news classifier on labelled English data in order categorise unlabelled isiZulu...

10.18653/v1/2023.rail-1.2 article EN cc-by 2023-01-01
Coming Soon ...