• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
    • Tech Futures
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

The Ethical Implications of Generative Audio Models: A Systematic Literature Review

August 13, 2023

🔬 Research Summary by Julia Barnett, a PhD student in Technology and Social Behavior, a dual PhD program in computer science and communications at Northwestern University, whose research aims at reducing the socio-technical harms of algorithmic systems.

[Original paper by Julia Barnett]


Overview: This paper analyzes an exhaustive set of 884 papers in the generative audio domain to quantify how generative audio researchers discuss potential negative impacts of their work and catalog the types of impacts being considered. Jarringly, less than 10% of works discuss any potential negative impacts—particularly worrying because the papers that do so raise serious ethical implications and concerns relevant to the broader field, such as the potential for fraud, deep-fakes, and copyright infringement.


Introduction

Generative audio modeling is a growing area of research with recent models having human-like quality in their audio output for both music and speech generations. Recent work needs only 10 seconds of a speaker’s voice to create high-quality realistic text-to-speech audio generation (Kim et al. 2022) that could easily be used in deep fakes or phishing. In generative music, a new model made by Google now allows us to make new pieces of music by inputting highly detailed text descriptions (Agostinelli et al., 2023). However, it’s hard to say when these models produce outputs with substantial similarity to their training data of potentially copyrighted works or scraped songs from artists without their consent. 

The creators of both of these models announced they had no intention to release their models to the public due to the strong potential for misuse. However, they are in the small minority of generative audio researchers who discuss any potential negative impacts or ethical considerations of their creations, and out of the 171 full-text research papers analyzed in this study, they were among the lone 9% of papers that mentioned negative impacts even once in their papers. 

Key Insights

What Are Generative Audio Models?

Generative models have been a large focus of AI researchers over the past few years, and recently society has seen these models first-hand in public facing algorithms like ChatGPT for text and DALL-E 2 for images, but generative audio models have fallen a bit under the public radar. At their core, generative models use a large amount of training data to predict something similar to and statistically likely to exist in the dataset it was trained on; in generative audio models, this means they typically train on some sort of music database to create new songs or speech database to create human-sounding speech. Some audio models you can play around with that you may not have heard of are MusicLM and AudioLM, a text-to-music and a text-to-speech model, respectively.

Researcher papers in this domain often have one big gap: they do not tend to discuss potential negative impacts. This is not for lack of considering any potential impact; the author found that 65% of the papers analyzed talked about some potential positive implications of their work. They just neglected to mention any potential negative impacts.

Different Negative Impacts of Generative Audio

In addition to quantifying the degree to which researchers in the field discuss ethics and negative impacts, the author also strove to catalog the different types of ethical considerations discussed in the small percent of papers that did so. These are split into negative broader impacts in generative music models, generative speech models, and those present in both areas.

Generative Music

One of the most important considerations of generative music models—ethically and potentially legally—is copyright infringement. It is widely established that generative models can memorize and reproduce information from their training data, and it stands to reason that these models could recreate copyrighted material.

The most common potential negative impact discussed in the corpus was the stifling of creativity due to AI music generation, which focused on the repetitive nature of the music generation and that limiting the creative output to possibilities of the model may result in a similar bound on human creativity. Another issue concerns the loss of agency and authorship that human creators can feel when creating music with the assistance of an AI generative model.

Machine learning models often perpetuate biases in the training data, and generative models are no different. It is important to be aware of the composition of the training data to

understand what biases could be perpetuated; models trained on Western music will perpetuate the biases of Western culture. Additionally, generative audio models sometimes

train on incomprehensible amounts of training data, and it follows that some of this data come from cultures outside the algorithm’s creator or users of the model. A fundamental lack of understanding of model attribution will result in cultural appropriation if the training data contains content from marginalized communities.

Generative Speech

Models that can accurately recreate human-sounding voices, especially of a targeted speaker (think: someone’s child or grandma), have enormous potential to be misused in cases of phishing and fraud. Some of these models only need 10 seconds of someone’s voice to train on. A slightly nuanced aspect of speech generative models’ ability to impersonate victims exists

when the victims are famous, and the model misuse can take the form of misinformation or deepfakes. As these models continue to become easier to use, the prevalence of deepfakes and misinformation online will continue to grow. There are also security and privacy concerns in the form of machine-induced audio attacks on intelligent audio systems, such as hidden voice commands that can manipulate voice-protected or operated systems.

All Audio Models

One concern for all generative audio models is the energy consumption of these models. There are two types of energy consumption of a generative model: the energy required to train and to generate samples. Current research points to machine learning models at risk of significantly contributing to climate change. It proposes the total energy consumption and carbon emissions of training these models be reported alongside the other standard suite of metrics like accuracy and speed.

There are certainly more ethical considerations than those detailed above, but these are some of the main ones already being discussed by researchers in the field. We can and should continue to build this list of considerations as we continue to build these models.

Between the lines

Generative audio models are not going away—they will only continue growing. It is essential to consider these impacts going forward at all stages of research: during the design process, the implementation of these models, and their publication and publicization. Two papers in the corpus explicitly mentioned that they did not intend to release their models or code due to the potential for misuse by bad actors. This is a viable consideration for model creators to make and should not be taken lightly.

This is an agenda-setting paper at the right time—it is important to both diagnose the degree to which research papers on generative audio models are discussing ethics and encourage the plethora of researchers to come to contemplate these negative broader impacts in their future work before the field being clogged by studies without ethical consideration.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Close-up of a cat sleeping on a computer keyboard

Tech Futures: The threat of AI-generated code to the world’s digital infrastructure

The undying sun hangs in the sky, as people gather around signal towers, working through their digital devices.

Dreams and Realities in Modi’s AI Impact Summit

Illustration of a coral reef ecosystem

Tech Futures: Diversity of Thought and Experience: The UN’s Scientific Panel on AI

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

related posts

  • Can You Meaningfully Consent in Eight Seconds? Identifying Ethical Issues with Verbal Consent for Vo...

    Can You Meaningfully Consent in Eight Seconds? Identifying Ethical Issues with Verbal Consent for Vo...

  • The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks (Research Summa...

    The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks (Research Summa...

  • Applying the TAII Framework on Tesla Bot

    Applying the TAII Framework on Tesla Bot

  • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

  • A hunt for the Snark: Annotator Diversity in Data Practices

    A hunt for the Snark: Annotator Diversity in Data Practices

  • Algorithms Deciding the Future of Legal Decisions

    Algorithms Deciding the Future of Legal Decisions

  • Computers, Creativity and Copyright: Autonomous Robot’s Status, Authorship, and Outdated Copyright L...

    Computers, Creativity and Copyright: Autonomous Robot’s Status, Authorship, and Outdated Copyright L...

  • The Ethical AI Startup Ecosystem 01: An Overview of Ethical AI Startups

    The Ethical AI Startup Ecosystem 01: An Overview of Ethical AI Startups

  • Fair Interpretable Representation Learning with Correction Vectors

    Fair Interpretable Representation Learning with Correction Vectors

  • What has been published about ethical and social science considerations regarding the pandemic outbr...

    What has been published about ethical and social science considerations regarding the pandemic outbr...

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.