• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

August 13, 2023

🔬 Research Summary by Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs.

MdRC is a JSMF research fellow at the Complexity Science Hub in Vienna and visiting scholar at the Harvard Kennedy School

NL is an Assistant Professor at the Faculty of Economics, LMU Munich, working at the Chair of Organizational Economics.

JW is an associate professor at Corvinus University of Budapest and a senior research fellow at the Hungarian Centre for Economic and Regional Studies.

[Original paper by R. Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs]


Overview: The widespread adoption of large language models can substitute for public knowledge sharing in online communities. This paper finds that the release of ChatGPT led to a significant decrease in content creation on Stack Overflow, the biggest question-and-answer (Q&A) community for computer programming. We argue that this increasingly displaced content is an important public good, which provides essential information for learners, both human and artificial. 


Introduction

Have you ever wondered how the rise of AI language models like ChatGPT might change how we share and access information online? In our recent study, we measured the impact of ChatGPT on Stack Overflow. On this popular online platform, computer programmers ask and answer questions, forming a library of content that anyone with an internet connection can learn from.

To investigate the consequence of AI adoption on digital public goods, we analyzed the activity on Stack Overflow before and after the release of ChatGPT. We compared this with the activity on Mathematics and Maths Overflow, two Stack Exchange platforms for which ChatGPT is less able to answer questions, as well as the Russian and Chinese language versions of Stack Overflow, for whose users ChatGPT was harder to access. 

We observe a 16% decrease in activity on Stack Overflow relative to the less affected platforms following the release of ChatGPT. The effect’s magnitude increases over time, reaching 25% by the end of May 2023. While we do not find major differences in the quality of displaced content, we note significant heterogeneity between programming languages, with more popular ones being more strongly affected. This suggests that as more people turn to AI for answers, less knowledge is publicly shared. 

Key Insights

The Impact of AI on Digital Public Goods

Large language models (LLMs) like ChatGPT can provide users with information on various topics, making them a convenient alternative to traditional web searches or online Q&A communities.  But what happens to the wealth of human-generated data on the web when more people start turning to AI for answers?

Our research focused on this question, investigating the potential impact of AI language models on digital public goods. Digital public goods, in this context, refer to the vast library of human-generated data and knowledge resources available on the web. These resources, such as the information shared on platforms like Stack Overflow, Wikipedia, or Reddit, serve as crucial resources for learning, problem-solving, and even training future AI models.

The Case of Stack Overflow

We focused on Stack Overflow because it is a rich source of human-generated data, with tens of millions of posts since its launch in 2008. It covers a wide range of programming languages and topics. Moreover, LLMs, in general, are relatively good at coding. We analyzed posting activity on Stack Overflow before and after the release of ChatGPT, comparing the changes on Stack Overflow against similar platforms, where ChatGPT was less likely to make an impact. Specifically, we considered Russian and Chinese language counterparts to Stack Overflow because ChatGPT is not available in Russia or China, and question and answer communities focusing on advanced mathematics, which ChatGPT cannot (yet) provide much help with.

Findings: A Decrease in Activity

Our analysis revealed a significant decrease in activity on Stack Overflow following the release of ChatGPT compared to the control platforms. We estimate a 16% relative decrease in weekly posts since the release of ChatGPT, with the effect’s magnitude reaching a 25% decrease by June 2023. Interestingly, the decrease in activity was not limited to duplicate or low-quality content. We found that posts made after ChatGPT received similar positive and negative voting scores to those made before, indicating that high-quality content was also being displaced.

Furthermore, the impact of ChatGPT varied across different programming languages, with ChatGPT being a better substitute for Stack Overflow when more training data had been available. Accordingly, posting activity in popular languages, like Python and Javascript, decreased significantly more than the global site average. 

Implications: A Shift in Information Exchange

Why is this displacement important? We discuss four implications for the field of artificial intelligence in our paper. First, if language models crowd out open data, they will be limiting their own future training data. There is a growing body of literature that LLMs cannot learn effectively from the content they generate. In this way, successful LLMs may ironically be limiting their future training sources.

Second, current leaders like OpenAI are accumulating a significant advantage over competitors: their models can learn from their user inputs and feedback while they drain the pool of open data. Third, the shift from a public provision of information on the web to a private one may have significant economic consequences, for instance, by amplifying inequalities or limiting the ability of people and firms to signal their abilities.

Finally, while AI models like ChatGPT offer efficiency and convenience, we know that centralized information sources have drawbacks. For instance, when the web and search engines made it easier to search for scientific journals, researchers started citing more recent papers and fewer journals as their sources. Such a narrowing of our collective focus and attention, even if it moves towards relatively high-quality information, limits the diversity of signals we are exposed to and may lead to suboptimal conformity. More generally, it is unclear how LLMs will help us deal with new problems as the world changes.

Between the lines

Our findings present a mixed picture of the interplay between AI and human-generated digital content. While language models like ChatGPT offer undeniable benefits in terms of efficiency and convenience, our research suggests that their widespread adoption could have unintended consequences for the richness and diversity of our shared digital knowledge base. More work is needed to tease out the heterogeneities of the impact of LLMs on digital public goods and how different platforms and communities are affected.

But the big open question is what we should do about this. Can we better incentivize or give credit for contributions to digital public goods? Can we empower people who create data that platforms, firms, and models use to capture some of that value? It seems to us that for the sake of an open and intellectually diverse web, we must address these questions.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • Research summary: Challenging Truth and Trust: A Global Inventory of Organized Social Media Manipula...

    Research summary: Challenging Truth and Trust: A Global Inventory of Organized Social Media Manipula...

  • Use case cards: a use case reporting framework inspired by the European AI Act

    Use case cards: a use case reporting framework inspired by the European AI Act

  • From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

    From Instructions to Intrinsic Human Values - A Survey of Alignment Goals for Big Models

  • Unpacking Human-AI interaction (HAII) in safety-critical industries

    Unpacking Human-AI interaction (HAII) in safety-critical industries

  • Animism, Rinri, Modernization; the Base of Japanese Robotics

    Animism, Rinri, Modernization; the Base of Japanese Robotics

  • Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

    Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

  • De-platforming disinformation: conspiracy theories and their control

    De-platforming disinformation: conspiracy theories and their control

  • Project Let’s Talk Privacy (Research Summary)

    Project Let’s Talk Privacy (Research Summary)

  • Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

    Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

  • Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

    Algorithmic Harms in Child Welfare: Uncertainties in Practice, Organization, and Street-level Decisi...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.