• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Language Models: A Guide for the Perplexed

February 6, 2024

🔬 Research Summary by Sofia Serrano, a Ph.D. candidate in computer science at the University of Washington (and will be an Assistant Professor at Lafayette College starting in Autumn 2024), specifically focusing on the interpretability of contemporary natural language processing models.

[Original paper by Sofia Serrano, Zander Brumbaugh, and Noah A. Smith]


Overview: Language models are seemingly everywhere in the news, but their explanations are either very high-level or technical and geared toward experts. As natural language processing (NLP) researchers, we thought it would be helpful for us to write a guide for readers outside of NLP who are interested in a more in-depth look at how language models work, the factors that have contributed to their recent development, and how they might continue to develop.


Introduction

Given the growing importance of AI literacy, we decided to write this tutorial on language models (LMs) to help narrow the gap between the discourse among those who study LMs—the core technology underlying ChatGPT and similar products—and those who are intrigued and want to learn more about them. In short, we believe the perspective of researchers and educators can clarify the public’s understanding of the technologies beyond what’s currently available, which tends to be either extremely technical or promotional material generated about products by their purveyors.

Our approach teases apart the concept of a language model (LM) from products built on them, from the behaviors attributed to or desired from those products, and claims about similarity to human cognition. As a starting point, we (1) offer a scientific viewpoint that focuses on questions amenable to study through experimentation; (2) situate language models as they are today in the context of the research that led to their development; and (3) describe the boundaries of what is known about the models at this writing.

Key Insights

Tasks, Data, and Evaluation Methods

To understand the last few years’ developments around language models, it’s helpful to have some context about the research field that produced them. Therefore, we begin our guide by explaining how the field of NLP has typically approached building computer systems to work with text in the last couple of decades.

The first idea we discuss is how NLP researchers turn idealized things we’d like a computer to be able to do, like “have an understanding of grammar,” “write coherently,” or “translate between languages,” into a simplified problem on which we can begin to chip away. These simplified problems are known as “tasks” and turn a desired computer behavior like “translating between languages” into something more concrete like “given an English sentence, translate it into French.”

Crucially, there is a gap between idealized computer behavior and the “tasks” they are simplified into— to use our translation example, anyone who’s read the same book in two different languages can tell you that there is an art to how human translators balance faithfulness to the original work and the conventions of the work’s new language, to avoid stilted prose. This process often involves slightly rearranging sentences so that there aren’t exactly the same number of sentences in the two versions of the book, and our distillation of “translating text between languages” into translating sentence-for-sentence obscures that. But making progress towards that intermediate stepping stone of a task helps to make progress towards the larger goal.

We then discuss how deciding on a source of data and an evaluation method for a given simplified task lend themselves to training neural network-based models that perform that task.

The “Language Modeling” Task: Next-Word Prediction

With all that said, what task have language models been trained to perform? As it turns out, their task is next-word prediction, which has already been known for many years as “language modeling” in NLP. In other words, given some text in progress, like “This document is about natural language _____,” a language model is trained to try to predict the next word. (For our example, “processing” would be a reasonable guess.)

While language models have been around in NLP for a long time, it was only recently that researchers began to recognize that past a certain point, to do really well on language modeling, a language model needed to pick up certain facts and world knowledge (for example, to do well at filling in the blanks for “The Declaration of Independence was signed by the Second Continental Congress in the year ____,” or “When the boy received a birthday gift from his friends, he felt ____”).

But even today, the training of language models is still based on optimizing for low “perplexity”—that is, the same measure of a language model’s word-by-word “surprise” at the true, revealed continuation of text-in-progress that we’ve been using in NLP for decades.

Getting from Language Models to Today’s Large Language Models

While perplexity has continued to be our central quantity of interest for language models, that’s not to say that nothing has changed in the last few years about how language models are developed. We discuss two key changes: a move towards training on far more data and also the adoption of a type of neural network called the “transformer,” which is structured in such a way as to enable faster training on more data (provided a model developer has access to certain hardware—specifically GPUs—with a lot of memory).

We then discuss a few of the impacts of those changes and of the resulting surge in performance on language modeling. For example, we discuss how language models are now commonly used to perform other “tasks” that would have involved separately trained models a few years ago and how moving towards larger models has contributed to current NLP models’ relative inscrutability. We also talk about how the rising cost of developing new language models has considerably narrowed the field of which entities/companies can now afford to produce them, the current strategies they use to adapt LMs for use as products, and how difficult it is to evaluate LMs.

Implications of How Language Models Work for Common Questions About Them

Based on our earlier discussion of how language models work, we address a few common questions about using language models, including the importance of particular prompts and which kinds of things are essential to check language model output. We also offer a bit of context for discussions around whether language models count as “intelligent.” However, this is largely a side question for most people considering LMs.

Where Language Models are Headed

We close with some parting words about the difficulty of making projections about the future of LMs and the development of the regulation landscape around LMs. Finally, we list a few helpful actions that people reading the guide can consider moving forward to contribute to a healthy AI landscape.

Between the lines

Current language models are downright perplexing! By considering the trends in the research communities that produced them, we understand why these models behave as they do. Keeping in mind the primary task these models have been trained to accomplish, i.e., next-word prediction, also helps us understand how they work.

Many open questions about these models remain—ranging from how to steer models away from generating incorrect information to how best to customize models for different use cases to which strategies to use to democratize their development. However, we hope our tutorial can provide some helpful guidance on using and assessing LMs.

Though determining how these technologies will continue to develop is difficult, there are helpful actions that each of us can take to push that development in a positive direction. By broadening the number and type of people involved in decisions about model development and engaging in broader conversations about the role of LMs and AI in society, we can all help shape AI systems into a positive force.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

related posts

  • Research summary: Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Man...

    Research summary: Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Man...

  • Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science

    Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science

  • Artificial intelligence and biological misuse: Differentiating risks of language models and biologic...

    Artificial intelligence and biological misuse: Differentiating risks of language models and biologic...

  • What has been published about ethical and social science considerations regarding the pandemic outbr...

    What has been published about ethical and social science considerations regarding the pandemic outbr...

  • Research summary:  Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

    Research summary: Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

  • Risk of AI in Healthcare: A Study Framework

    Risk of AI in Healthcare: A Study Framework

  • Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

    Transferring Fairness under Distribution Shifts via Fair Consistency Regularization

  • Brave: what it means to be an AI Ethicist

    Brave: what it means to be an AI Ethicist

  • Whose AI Dream? In search of the aspiration in data annotation.

    Whose AI Dream? In search of the aspiration in data annotation.

  • The Algorithm Audit: Scoring the Algorithms That Score Us (Research Summary)

    The Algorithm Audit: Scoring the Algorithms That Score Us (Research Summary)

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.