• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

August 13, 2023

🔬 Research Summary by Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs.

MdRC is a JSMF research fellow at the Complexity Science Hub in Vienna and visiting scholar at the Harvard Kennedy School

NL is an Assistant Professor at the Faculty of Economics, LMU Munich, working at the Chair of Organizational Economics.

JW is an associate professor at Corvinus University of Budapest and a senior research fellow at the Hungarian Centre for Economic and Regional Studies.

[Original paper by R. Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs]


Overview: The widespread adoption of large language models can substitute for public knowledge sharing in online communities. This paper finds that the release of ChatGPT led to a significant decrease in content creation on Stack Overflow, the biggest question-and-answer (Q&A) community for computer programming. We argue that this increasingly displaced content is an important public good, which provides essential information for learners, both human and artificial. 


Introduction

Have you ever wondered how the rise of AI language models like ChatGPT might change how we share and access information online? In our recent study, we measured the impact of ChatGPT on Stack Overflow. On this popular online platform, computer programmers ask and answer questions, forming a library of content that anyone with an internet connection can learn from.

To investigate the consequence of AI adoption on digital public goods, we analyzed the activity on Stack Overflow before and after the release of ChatGPT. We compared this with the activity on Mathematics and Maths Overflow, two Stack Exchange platforms for which ChatGPT is less able to answer questions, as well as the Russian and Chinese language versions of Stack Overflow, for whose users ChatGPT was harder to access. 

We observe a 16% decrease in activity on Stack Overflow relative to the less affected platforms following the release of ChatGPT. The effect’s magnitude increases over time, reaching 25% by the end of May 2023. While we do not find major differences in the quality of displaced content, we note significant heterogeneity between programming languages, with more popular ones being more strongly affected. This suggests that as more people turn to AI for answers, less knowledge is publicly shared. 

Key Insights

The Impact of AI on Digital Public Goods

Large language models (LLMs) like ChatGPT can provide users with information on various topics, making them a convenient alternative to traditional web searches or online Q&A communities.  But what happens to the wealth of human-generated data on the web when more people start turning to AI for answers?

Our research focused on this question, investigating the potential impact of AI language models on digital public goods. Digital public goods, in this context, refer to the vast library of human-generated data and knowledge resources available on the web. These resources, such as the information shared on platforms like Stack Overflow, Wikipedia, or Reddit, serve as crucial resources for learning, problem-solving, and even training future AI models.

The Case of Stack Overflow

We focused on Stack Overflow because it is a rich source of human-generated data, with tens of millions of posts since its launch in 2008. It covers a wide range of programming languages and topics. Moreover, LLMs, in general, are relatively good at coding. We analyzed posting activity on Stack Overflow before and after the release of ChatGPT, comparing the changes on Stack Overflow against similar platforms, where ChatGPT was less likely to make an impact. Specifically, we considered Russian and Chinese language counterparts to Stack Overflow because ChatGPT is not available in Russia or China, and question and answer communities focusing on advanced mathematics, which ChatGPT cannot (yet) provide much help with.

Findings: A Decrease in Activity

Our analysis revealed a significant decrease in activity on Stack Overflow following the release of ChatGPT compared to the control platforms. We estimate a 16% relative decrease in weekly posts since the release of ChatGPT, with the effect’s magnitude reaching a 25% decrease by June 2023. Interestingly, the decrease in activity was not limited to duplicate or low-quality content. We found that posts made after ChatGPT received similar positive and negative voting scores to those made before, indicating that high-quality content was also being displaced.

Furthermore, the impact of ChatGPT varied across different programming languages, with ChatGPT being a better substitute for Stack Overflow when more training data had been available. Accordingly, posting activity in popular languages, like Python and Javascript, decreased significantly more than the global site average. 

Implications: A Shift in Information Exchange

Why is this displacement important? We discuss four implications for the field of artificial intelligence in our paper. First, if language models crowd out open data, they will be limiting their own future training data. There is a growing body of literature that LLMs cannot learn effectively from the content they generate. In this way, successful LLMs may ironically be limiting their future training sources.

Second, current leaders like OpenAI are accumulating a significant advantage over competitors: their models can learn from their user inputs and feedback while they drain the pool of open data. Third, the shift from a public provision of information on the web to a private one may have significant economic consequences, for instance, by amplifying inequalities or limiting the ability of people and firms to signal their abilities.

Finally, while AI models like ChatGPT offer efficiency and convenience, we know that centralized information sources have drawbacks. For instance, when the web and search engines made it easier to search for scientific journals, researchers started citing more recent papers and fewer journals as their sources. Such a narrowing of our collective focus and attention, even if it moves towards relatively high-quality information, limits the diversity of signals we are exposed to and may lead to suboptimal conformity. More generally, it is unclear how LLMs will help us deal with new problems as the world changes.

Between the lines

Our findings present a mixed picture of the interplay between AI and human-generated digital content. While language models like ChatGPT offer undeniable benefits in terms of efficiency and convenience, our research suggests that their widespread adoption could have unintended consequences for the richness and diversity of our shared digital knowledge base. More work is needed to tease out the heterogeneities of the impact of LLMs on digital public goods and how different platforms and communities are affected.

But the big open question is what we should do about this. Can we better incentivize or give credit for contributions to digital public goods? Can we empower people who create data that platforms, firms, and models use to capture some of that value? It seems to us that for the sake of an open and intellectually diverse web, we must address these questions.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

A brightly coloured illustration which can be viewed in any direction. It has many elements to it working together: men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part II

A rock embedded with intricate circuit board patterns, held delicately by pale hands drawn in a ghostly style. The contrast between the rough, metallic mineral and the sleek, artificial circuit board illustrates the relationship between raw natural resources and modern technological development. The hands evoke human involvement in the extraction and manufacturing processes.

Tech Futures: The Fossil Fuels Playbook for Big Tech: Part I

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

related posts

  • Exploring Antitrust and Platform Power in Generative AI

    Exploring Antitrust and Platform Power in Generative AI

  • The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

    The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

  • The Chief AI Ethics Officer: A Champion or a PR Stunt?

    The Chief AI Ethics Officer: A Champion or a PR Stunt?

  • Intersectional Inquiry, on the Ground and in the Algorithm

    Intersectional Inquiry, on the Ground and in the Algorithm

  • Public Strategies for Artificial Intelligence: Which Value Drivers?

    Public Strategies for Artificial Intelligence: Which Value Drivers?

  • Measuring Fairness of Text Classifiers via Prediction Sensitivity

    Measuring Fairness of Text Classifiers via Prediction Sensitivity

  • It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

    It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

  • On the Construction of Artificial Moral Agents Agents

    On the Construction of Artificial Moral Agents Agents

  • Sharing Space in Conversational AI

    Sharing Space in Conversational AI

  • Blending Brushstrokes with Bytes: My Artistic Odyssey from Analog to AI

    Blending Brushstrokes with Bytes: My Artistic Odyssey from Analog to AI

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.