• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • State of AI Ethics Report Volume 8 (2026): Call for Contributors
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages (Research summary)

September 21, 2020

Summary contributed by our researcher Alexandrine Royer, who works at The Foundation for Genocide Education.

*Link to original paper + authors at the bottom.


Mini-summary: It is no secret that English has dominated the machine learning landscape. Yet, multilingual researchers worldwide are trying to change the narrative and put their language on the digital map. With machine learning research efforts springing up across the continent, which is home to over 1500 languages, it is difficult to coordinate and keep track of current research happening in silos. Emezue et Dossou found that a significant hindrance to the advancement of MT research on African languages is the lack of a central database that gives potential users quick access to benchmarks and resources and enables them to build comparative models. The authors propose an open-source and publicly available database, titled Lanafrica, that will allow users from the scientific and non-scientific community to catalog and track the latest research on machine learning developments in African languages.

Full summary:

English has become the lingua franca of machine learners and data scientists, yet a minority of fewer than 26% of internet users speak it. Against this trend, there have been a growing number of initiatives to include African languages in machine translation research, and in particular, natural learning processes for online platforms. Africa is the continent with the highest language diversity, being home to over 1500 documented languages, and over 40% of its population uses social media platforms. To keep track of these ongoing developments, Emezue et Dossou offers Lanfrica a participatory-led framework in documenting researches, projects, benchmarks, and datasets on African languages.  

As Emezue et Dossou points out, there are already several existing online communities dedicated to promoting AI research in Africa, such as Masakhane, Deep Learning Indaba, BlackinAI and Zindi. These organizations reflect not only a desire to put Africa forward in machine learning but also to preserve the continent’s distinct cultures within the digital space. Some limitations currently hinder the advancement of African natural language processes, including: 

  • A lack of confidence from African societies that their languages can be a prevalent mode of communication in the future 
  • A lack of resources for African languages
  • A lack of publicly available benchmarks 
  • Minimal sharing of existing research and code

To redress these issues of lack discoverability, publicly available benchmarks, and sharing of resources, Emezue et Dossou created an open-source and user-friendly database system that documents machine learning researches, research-results, benchmarks, and projects on African languages. By surveying the Masakhane community, an open-source group of NLP researchers, the authors found that to build a neural machine translation (NMT) model, researchers had difficulty accessing model comparisons to guide them in data preparation, model configuration, training, and evaluation. 

The soon-to-be-launched Lanafrica website will catalog ongoing ML research efforts based on the African language of interest and allow users to submit information on their projects, with contributions coming from both researchers and non-researchers alike. To improve ML reproducibility, links that provide access to open-source test data will be featured on the website. 

Despite being a growing pole of ML research, Africa is underrepresented in discussions surrounding AI, often overshadowed by academic and corporate research labs in wealthy bubbles such as Silicon Valley and Zhongguancun. Digital assistants like Siri, Google Talk, and Alexa have yet to be programmed to accommodate widely-spoken languages such as Lingala, Oromo, and Swahili, and Google Translate only offers translations for 13 African languages. Unlike large databases such as Google scholar, Lanafrica is an initiative that is specifically tailored to African language researchers, allowing them to build networks in a digital space that reflects their interests and priorities. As the most linguistically diverse place on Earth, natural language machine learners in North America and Asia can also benefit from learning about the advances in Africa. 


Original paper by Chris C. Emezue, Bonaventure F.P. Dossou: https://arxiv.org/pdf/2008.07302.pdf

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

SAIER Volume 8 (2026)

SAIER Volume 8 (2026) Call for Contributors

🔍 SEARCH

Spotlight

Tech Futures: Introducing the Resist List

An abstract spiral of dark circles appears at the centre, resembling a tornado. Several vintage magazine covers and advertisements are being drawn toward the spiral. The artworks that have already been pulled into it are becoming distorted and replaced with clusters of numbers representing their numerical embeddings.

Tech Futures: Better Imagination for Better Tech Futures

This image is a collage with a colourful Japanese vintage landscape showing a mountain, hills, flowers and other plants and a small stream. There are 3 large black data servers placed in the bottom half of the image, with a cloud of black smoke emitting from them, partly obscuring the scenery.

Tech Futures: Crafting Participatory Tech Futures

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

related posts

  • Cascaded Debiasing : Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

    Cascaded Debiasing : Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

  • Energy and Policy Considerations in Deep Learning for NLP

    Energy and Policy Considerations in Deep Learning for NLP

  • Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

    Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

  • Algorithmic accountability for the public sector

    Algorithmic accountability for the public sector

  • Research summary: Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Man...

    Research summary: Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Man...

  • Research summary: Fairness in Clustering with Multiple Sensitive Attributes

    Research summary: Fairness in Clustering with Multiple Sensitive Attributes

  • How Tech Companies are Helping Big Oil Profit from Climate Destruction

    How Tech Companies are Helping Big Oil Profit from Climate Destruction

  • “Cool Projects” or “Expanding the Efficiency of the Murderous American War Machine?” (Research Summa...

    “Cool Projects” or “Expanding the Efficiency of the Murderous American War Machine?” (Research Summa...

  • Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

    Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

  • Bots don’t Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

    Bots don’t Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.