• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Language Models: A Guide for the Perplexed

February 6, 2024

🔬 Research Summary by Sofia Serrano, a Ph.D. candidate in computer science at the University of Washington (and will be an Assistant Professor at Lafayette College starting in Autumn 2024), specifically focusing on the interpretability of contemporary natural language processing models.

[Original paper by Sofia Serrano, Zander Brumbaugh, and Noah A. Smith]


Overview: Language models are seemingly everywhere in the news, but their explanations are either very high-level or technical and geared toward experts. As natural language processing (NLP) researchers, we thought it would be helpful for us to write a guide for readers outside of NLP who are interested in a more in-depth look at how language models work, the factors that have contributed to their recent development, and how they might continue to develop.


Introduction

Given the growing importance of AI literacy, we decided to write this tutorial on language models (LMs) to help narrow the gap between the discourse among those who study LMs—the core technology underlying ChatGPT and similar products—and those who are intrigued and want to learn more about them. In short, we believe the perspective of researchers and educators can clarify the public’s understanding of the technologies beyond what’s currently available, which tends to be either extremely technical or promotional material generated about products by their purveyors.

Our approach teases apart the concept of a language model (LM) from products built on them, from the behaviors attributed to or desired from those products, and claims about similarity to human cognition. As a starting point, we (1) offer a scientific viewpoint that focuses on questions amenable to study through experimentation; (2) situate language models as they are today in the context of the research that led to their development; and (3) describe the boundaries of what is known about the models at this writing.

Key Insights

Tasks, Data, and Evaluation Methods

To understand the last few years’ developments around language models, it’s helpful to have some context about the research field that produced them. Therefore, we begin our guide by explaining how the field of NLP has typically approached building computer systems to work with text in the last couple of decades.

The first idea we discuss is how NLP researchers turn idealized things we’d like a computer to be able to do, like “have an understanding of grammar,” “write coherently,” or “translate between languages,” into a simplified problem on which we can begin to chip away. These simplified problems are known as “tasks” and turn a desired computer behavior like “translating between languages” into something more concrete like “given an English sentence, translate it into French.”

Crucially, there is a gap between idealized computer behavior and the “tasks” they are simplified into— to use our translation example, anyone who’s read the same book in two different languages can tell you that there is an art to how human translators balance faithfulness to the original work and the conventions of the work’s new language, to avoid stilted prose. This process often involves slightly rearranging sentences so that there aren’t exactly the same number of sentences in the two versions of the book, and our distillation of “translating text between languages” into translating sentence-for-sentence obscures that. But making progress towards that intermediate stepping stone of a task helps to make progress towards the larger goal.

We then discuss how deciding on a source of data and an evaluation method for a given simplified task lend themselves to training neural network-based models that perform that task.

The “Language Modeling” Task: Next-Word Prediction

With all that said, what task have language models been trained to perform? As it turns out, their task is next-word prediction, which has already been known for many years as “language modeling” in NLP. In other words, given some text in progress, like “This document is about natural language _____,” a language model is trained to try to predict the next word. (For our example, “processing” would be a reasonable guess.)

While language models have been around in NLP for a long time, it was only recently that researchers began to recognize that past a certain point, to do really well on language modeling, a language model needed to pick up certain facts and world knowledge (for example, to do well at filling in the blanks for “The Declaration of Independence was signed by the Second Continental Congress in the year ____,” or “When the boy received a birthday gift from his friends, he felt ____”).

But even today, the training of language models is still based on optimizing for low “perplexity”—that is, the same measure of a language model’s word-by-word “surprise” at the true, revealed continuation of text-in-progress that we’ve been using in NLP for decades.

Getting from Language Models to Today’s Large Language Models

While perplexity has continued to be our central quantity of interest for language models, that’s not to say that nothing has changed in the last few years about how language models are developed. We discuss two key changes: a move towards training on far more data and also the adoption of a type of neural network called the “transformer,” which is structured in such a way as to enable faster training on more data (provided a model developer has access to certain hardware—specifically GPUs—with a lot of memory).

We then discuss a few of the impacts of those changes and of the resulting surge in performance on language modeling. For example, we discuss how language models are now commonly used to perform other “tasks” that would have involved separately trained models a few years ago and how moving towards larger models has contributed to current NLP models’ relative inscrutability. We also talk about how the rising cost of developing new language models has considerably narrowed the field of which entities/companies can now afford to produce them, the current strategies they use to adapt LMs for use as products, and how difficult it is to evaluate LMs.

Implications of How Language Models Work for Common Questions About Them

Based on our earlier discussion of how language models work, we address a few common questions about using language models, including the importance of particular prompts and which kinds of things are essential to check language model output. We also offer a bit of context for discussions around whether language models count as “intelligent.” However, this is largely a side question for most people considering LMs.

Where Language Models are Headed

We close with some parting words about the difficulty of making projections about the future of LMs and the development of the regulation landscape around LMs. Finally, we list a few helpful actions that people reading the guide can consider moving forward to contribute to a healthy AI landscape.

Between the lines

Current language models are downright perplexing! By considering the trends in the research communities that produced them, we understand why these models behave as they do. Keeping in mind the primary task these models have been trained to accomplish, i.e., next-word prediction, also helps us understand how they work.

Many open questions about these models remain—ranging from how to steer models away from generating incorrect information to how best to customize models for different use cases to which strategies to use to democratize their development. However, we hope our tutorial can provide some helpful guidance on using and assessing LMs.

Though determining how these technologies will continue to develop is difficult, there are helpful actions that each of us can take to push that development in a positive direction. By broadening the number and type of people involved in decisions about model development and engaging in broader conversations about the role of LMs and AI in society, we can all help shape AI systems into a positive force.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • The role of the African value of Ubuntu in global AI inclusion discourse: A normative ethics perspec...

    The role of the African value of Ubuntu in global AI inclusion discourse: A normative ethics perspec...

  • AI Has Arrived in Healthcare, but What Does This Mean?

    AI Has Arrived in Healthcare, but What Does This Mean?

  • Embedding Ethical Principles into AI Predictive Tools for Migration Management in Humanitarian Actio...

    Embedding Ethical Principles into AI Predictive Tools for Migration Management in Humanitarian Actio...

  • A Lesson From AI: Ethics Is Not an Imitation Game

    A Lesson From AI: Ethics Is Not an Imitation Game

  • Bots don’t Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

    Bots don’t Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

  • Clinical trial site matching with improved diversity using fair policy learning

    Clinical trial site matching with improved diversity using fair policy learning

  • The MAIEI Learning Community Report (September 2021)

    The MAIEI Learning Community Report (September 2021)

  • Worried But Hopeful: The MAIEI State of AI Ethics Panel Recaps a Difficult Year

    Worried But Hopeful: The MAIEI State of AI Ethics Panel Recaps a Difficult Year

  • Diagnosing Gender Bias In Image Recognition Systems (Research Summary)

    Diagnosing Gender Bias In Image Recognition Systems (Research Summary)

  • The Impact of AI Art on the Creative Industry

    The Impact of AI Art on the Creative Industry

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.