• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages (Research summary)

September 21, 2020

Summary contributed by our researcher Alexandrine Royer, who works at The Foundation for Genocide Education.

*Link to original paper + authors at the bottom.


Mini-summary: It is no secret that English has dominated the machine learning landscape. Yet, multilingual researchers worldwide are trying to change the narrative and put their language on the digital map. With machine learning research efforts springing up across the continent, which is home to over 1500 languages, it is difficult to coordinate and keep track of current research happening in silos. Emezue et Dossou found that a significant hindrance to the advancement of MT research on African languages is the lack of a central database that gives potential users quick access to benchmarks and resources and enables them to build comparative models. The authors propose an open-source and publicly available database, titled Lanafrica, that will allow users from the scientific and non-scientific community to catalog and track the latest research on machine learning developments in African languages.

Full summary:

English has become the lingua franca of machine learners and data scientists, yet a minority of fewer than 26% of internet users speak it. Against this trend, there have been a growing number of initiatives to include African languages in machine translation research, and in particular, natural learning processes for online platforms. Africa is the continent with the highest language diversity, being home to over 1500 documented languages, and over 40% of its population uses social media platforms. To keep track of these ongoing developments, Emezue et Dossou offers Lanfrica a participatory-led framework in documenting researches, projects, benchmarks, and datasets on African languages.  

As Emezue et Dossou points out, there are already several existing online communities dedicated to promoting AI research in Africa, such as Masakhane, Deep Learning Indaba, BlackinAI and Zindi. These organizations reflect not only a desire to put Africa forward in machine learning but also to preserve the continent’s distinct cultures within the digital space. Some limitations currently hinder the advancement of African natural language processes, including: 

  • A lack of confidence from African societies that their languages can be a prevalent mode of communication in the future 
  • A lack of resources for African languages
  • A lack of publicly available benchmarks 
  • Minimal sharing of existing research and code

To redress these issues of lack discoverability, publicly available benchmarks, and sharing of resources, Emezue et Dossou created an open-source and user-friendly database system that documents machine learning researches, research-results, benchmarks, and projects on African languages. By surveying the Masakhane community, an open-source group of NLP researchers, the authors found that to build a neural machine translation (NMT) model, researchers had difficulty accessing model comparisons to guide them in data preparation, model configuration, training, and evaluation. 

The soon-to-be-launched Lanafrica website will catalog ongoing ML research efforts based on the African language of interest and allow users to submit information on their projects, with contributions coming from both researchers and non-researchers alike. To improve ML reproducibility, links that provide access to open-source test data will be featured on the website. 

Despite being a growing pole of ML research, Africa is underrepresented in discussions surrounding AI, often overshadowed by academic and corporate research labs in wealthy bubbles such as Silicon Valley and Zhongguancun. Digital assistants like Siri, Google Talk, and Alexa have yet to be programmed to accommodate widely-spoken languages such as Lingala, Oromo, and Swahili, and Google Translate only offers translations for 13 African languages. Unlike large databases such as Google scholar, Lanafrica is an initiative that is specifically tailored to African language researchers, allowing them to build networks in a digital space that reflects their interests and priorities. As the most linguistically diverse place on Earth, natural language machine learners in North America and Asia can also benefit from learning about the advances in Africa. 


Original paper by Chris C. Emezue, Bonaventure F.P. Dossou: https://arxiv.org/pdf/2008.07302.pdf

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

related posts

  • Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

    Disaster City Digital Twin: A Vision for Integrating Artificial and Human Intelligence for Disaster ...

  • The AI Carbon Footprint and Responsibilities of AI Scientists

    The AI Carbon Footprint and Responsibilities of AI Scientists

  • Moral Dilemmas for Moral Machines

    Moral Dilemmas for Moral Machines

  • Two Decades of Empirical Research on Trust in AI: A Bibliometric Analysis and HCI Research Agenda

    Two Decades of Empirical Research on Trust in AI: A Bibliometric Analysis and HCI Research Agenda

  • UNESCO’s Recommendation on the Ethics of AI

    UNESCO’s Recommendation on the Ethics of AI

  • Can LLMs Enhance the Conversational AI Experience?

    Can LLMs Enhance the Conversational AI Experience?

  • Why reciprocity prohibits autonomous weapons systems in war

    Why reciprocity prohibits autonomous weapons systems in war

  • Combatting Anti-Blackness in the AI Community

    Combatting Anti-Blackness in the AI Community

  • Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice

    Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice

  • Reports on Communication Surveillance in Botswana, Malawi and the DRC, and the Chinese Digital Infra...

    Reports on Communication Surveillance in Botswana, Malawi and the DRC, and the Chinese Digital Infra...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.