• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

August 13, 2023

🔬 Research Summary by Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs.

MdRC is a JSMF research fellow at the Complexity Science Hub in Vienna and visiting scholar at the Harvard Kennedy School

NL is an Assistant Professor at the Faculty of Economics, LMU Munich, working at the Chair of Organizational Economics.

JW is an associate professor at Corvinus University of Budapest and a senior research fellow at the Hungarian Centre for Economic and Regional Studies.

[Original paper by R. Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs]


Overview: The widespread adoption of large language models can substitute for public knowledge sharing in online communities. This paper finds that the release of ChatGPT led to a significant decrease in content creation on Stack Overflow, the biggest question-and-answer (Q&A) community for computer programming. We argue that this increasingly displaced content is an important public good, which provides essential information for learners, both human and artificial. 


Introduction

Have you ever wondered how the rise of AI language models like ChatGPT might change how we share and access information online? In our recent study, we measured the impact of ChatGPT on Stack Overflow. On this popular online platform, computer programmers ask and answer questions, forming a library of content that anyone with an internet connection can learn from.

To investigate the consequence of AI adoption on digital public goods, we analyzed the activity on Stack Overflow before and after the release of ChatGPT. We compared this with the activity on Mathematics and Maths Overflow, two Stack Exchange platforms for which ChatGPT is less able to answer questions, as well as the Russian and Chinese language versions of Stack Overflow, for whose users ChatGPT was harder to access. 

We observe a 16% decrease in activity on Stack Overflow relative to the less affected platforms following the release of ChatGPT. The effect’s magnitude increases over time, reaching 25% by the end of May 2023. While we do not find major differences in the quality of displaced content, we note significant heterogeneity between programming languages, with more popular ones being more strongly affected. This suggests that as more people turn to AI for answers, less knowledge is publicly shared. 

Key Insights

The Impact of AI on Digital Public Goods

Large language models (LLMs) like ChatGPT can provide users with information on various topics, making them a convenient alternative to traditional web searches or online Q&A communities.  But what happens to the wealth of human-generated data on the web when more people start turning to AI for answers?

Our research focused on this question, investigating the potential impact of AI language models on digital public goods. Digital public goods, in this context, refer to the vast library of human-generated data and knowledge resources available on the web. These resources, such as the information shared on platforms like Stack Overflow, Wikipedia, or Reddit, serve as crucial resources for learning, problem-solving, and even training future AI models.

The Case of Stack Overflow

We focused on Stack Overflow because it is a rich source of human-generated data, with tens of millions of posts since its launch in 2008. It covers a wide range of programming languages and topics. Moreover, LLMs, in general, are relatively good at coding. We analyzed posting activity on Stack Overflow before and after the release of ChatGPT, comparing the changes on Stack Overflow against similar platforms, where ChatGPT was less likely to make an impact. Specifically, we considered Russian and Chinese language counterparts to Stack Overflow because ChatGPT is not available in Russia or China, and question and answer communities focusing on advanced mathematics, which ChatGPT cannot (yet) provide much help with.

Findings: A Decrease in Activity

Our analysis revealed a significant decrease in activity on Stack Overflow following the release of ChatGPT compared to the control platforms. We estimate a 16% relative decrease in weekly posts since the release of ChatGPT, with the effect’s magnitude reaching a 25% decrease by June 2023. Interestingly, the decrease in activity was not limited to duplicate or low-quality content. We found that posts made after ChatGPT received similar positive and negative voting scores to those made before, indicating that high-quality content was also being displaced.

Furthermore, the impact of ChatGPT varied across different programming languages, with ChatGPT being a better substitute for Stack Overflow when more training data had been available. Accordingly, posting activity in popular languages, like Python and Javascript, decreased significantly more than the global site average. 

Implications: A Shift in Information Exchange

Why is this displacement important? We discuss four implications for the field of artificial intelligence in our paper. First, if language models crowd out open data, they will be limiting their own future training data. There is a growing body of literature that LLMs cannot learn effectively from the content they generate. In this way, successful LLMs may ironically be limiting their future training sources.

Second, current leaders like OpenAI are accumulating a significant advantage over competitors: their models can learn from their user inputs and feedback while they drain the pool of open data. Third, the shift from a public provision of information on the web to a private one may have significant economic consequences, for instance, by amplifying inequalities or limiting the ability of people and firms to signal their abilities.

Finally, while AI models like ChatGPT offer efficiency and convenience, we know that centralized information sources have drawbacks. For instance, when the web and search engines made it easier to search for scientific journals, researchers started citing more recent papers and fewer journals as their sources. Such a narrowing of our collective focus and attention, even if it moves towards relatively high-quality information, limits the diversity of signals we are exposed to and may lead to suboptimal conformity. More generally, it is unclear how LLMs will help us deal with new problems as the world changes.

Between the lines

Our findings present a mixed picture of the interplay between AI and human-generated digital content. While language models like ChatGPT offer undeniable benefits in terms of efficiency and convenience, our research suggests that their widespread adoption could have unintended consequences for the richness and diversity of our shared digital knowledge base. More work is needed to tease out the heterogeneities of the impact of LLMs on digital public goods and how different platforms and communities are affected.

But the big open question is what we should do about this. Can we better incentivize or give credit for contributions to digital public goods? Can we empower people who create data that platforms, firms, and models use to capture some of that value? It seems to us that for the sake of an open and intellectually diverse web, we must address these questions.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: New York City Local Law 144

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

related posts

  • Unlocking Accuracy and Fairness in Differentially Private Image Classification

    Unlocking Accuracy and Fairness in Differentially Private Image Classification

  • Confidence-Building Measures for Artificial Intelligence

    Confidence-Building Measures for Artificial Intelligence

  • The Political Power of Platforms: How Current Attempts to Regulate Misinformation Amplify Opinion Po...

    The Political Power of Platforms: How Current Attempts to Regulate Misinformation Amplify Opinion Po...

  • AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in...

    AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in...

  • Towards Responsible AI in the Era of ChatGPT: A Reference Architecture for Designing Foundation Mode...

    Towards Responsible AI in the Era of ChatGPT: A Reference Architecture for Designing Foundation Mode...

  • Social media polarization reflects shifting political alliances in Pakistan

    Social media polarization reflects shifting political alliances in Pakistan

  • An Audit Framework for Adopting AI-Nudging on Children

    An Audit Framework for Adopting AI-Nudging on Children

  • More Trust, Less Eavesdropping in Conversational AI

    More Trust, Less Eavesdropping in Conversational AI

  • Research summary: Different Intelligibility for Different Folks

    Research summary: Different Intelligibility for Different Folks

  • It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

    It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.