• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages (Research summary)

September 21, 2020

Summary contributed by our researcher Alexandrine Royer, who works at The Foundation for Genocide Education.

*Link to original paper + authors at the bottom.


Mini-summary: It is no secret that English has dominated the machine learning landscape. Yet, multilingual researchers worldwide are trying to change the narrative and put their language on the digital map. With machine learning research efforts springing up across the continent, which is home to over 1500 languages, it is difficult to coordinate and keep track of current research happening in silos. Emezue et Dossou found that a significant hindrance to the advancement of MT research on African languages is the lack of a central database that gives potential users quick access to benchmarks and resources and enables them to build comparative models. The authors propose an open-source and publicly available database, titled Lanafrica, that will allow users from the scientific and non-scientific community to catalog and track the latest research on machine learning developments in African languages.

Full summary:

English has become the lingua franca of machine learners and data scientists, yet a minority of fewer than 26% of internet users speak it. Against this trend, there have been a growing number of initiatives to include African languages in machine translation research, and in particular, natural learning processes for online platforms. Africa is the continent with the highest language diversity, being home to over 1500 documented languages, and over 40% of its population uses social media platforms. To keep track of these ongoing developments, Emezue et Dossou offers Lanfrica a participatory-led framework in documenting researches, projects, benchmarks, and datasets on African languages.  

As Emezue et Dossou points out, there are already several existing online communities dedicated to promoting AI research in Africa, such as Masakhane, Deep Learning Indaba, BlackinAI and Zindi. These organizations reflect not only a desire to put Africa forward in machine learning but also to preserve the continent’s distinct cultures within the digital space. Some limitations currently hinder the advancement of African natural language processes, including: 

  • A lack of confidence from African societies that their languages can be a prevalent mode of communication in the future 
  • A lack of resources for African languages
  • A lack of publicly available benchmarks 
  • Minimal sharing of existing research and code

To redress these issues of lack discoverability, publicly available benchmarks, and sharing of resources, Emezue et Dossou created an open-source and user-friendly database system that documents machine learning researches, research-results, benchmarks, and projects on African languages. By surveying the Masakhane community, an open-source group of NLP researchers, the authors found that to build a neural machine translation (NMT) model, researchers had difficulty accessing model comparisons to guide them in data preparation, model configuration, training, and evaluation. 

The soon-to-be-launched Lanafrica website will catalog ongoing ML research efforts based on the African language of interest and allow users to submit information on their projects, with contributions coming from both researchers and non-researchers alike. To improve ML reproducibility, links that provide access to open-source test data will be featured on the website. 

Despite being a growing pole of ML research, Africa is underrepresented in discussions surrounding AI, often overshadowed by academic and corporate research labs in wealthy bubbles such as Silicon Valley and Zhongguancun. Digital assistants like Siri, Google Talk, and Alexa have yet to be programmed to accommodate widely-spoken languages such as Lingala, Oromo, and Swahili, and Google Translate only offers translations for 13 African languages. Unlike large databases such as Google scholar, Lanafrica is an initiative that is specifically tailored to African language researchers, allowing them to build networks in a digital space that reflects their interests and priorities. As the most linguistically diverse place on Earth, natural language machine learners in North America and Asia can also benefit from learning about the advances in Africa. 


Original paper by Chris C. Emezue, Bonaventure F.P. Dossou: https://arxiv.org/pdf/2008.07302.pdf

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

A network diagram with lots of little emojis, organised in clusters.

Tech Futures: AI For and Against Knowledge

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

related posts

  • Trustworthiness of Artificial Intelligence

    Trustworthiness of Artificial Intelligence

  • Beyond Bias and Discrimination: Redefining the AI Ethics Principle of Fairness in Healthcare Machine...

    Beyond Bias and Discrimination: Redefining the AI Ethics Principle of Fairness in Healthcare Machine...

  • Brave: what it means to be an AI Ethicist

    Brave: what it means to be an AI Ethicist

  • Research summary: Aligning Super Human AI with Human Behavior: Chess as a Model System

    Research summary: Aligning Super Human AI with Human Behavior: Chess as a Model System

  • The State of AI Ethics Report (Volume 4)

    The State of AI Ethics Report (Volume 4)

  • Equal Improvability: A New Fairness Notion Considering the Long-term Impact

    Equal Improvability: A New Fairness Notion Considering the Long-term Impact

  • The Impact of AI Art on the Creative Industry

    The Impact of AI Art on the Creative Industry

  • The State of AI Ethics Report (Jan 2021)

    The State of AI Ethics Report (Jan 2021)

  • Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for G...

    Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for G...

  • The Abuse and Misogynoir Playbook, explained

    The Abuse and Misogynoir Playbook, explained

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.